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Data science Is an inter-disciplinary field that uses scientific methods, 





processes, algorithms and systems to extract knowledge and insights from 
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Statistics, data analysis and their related methods" in order to "Understand 
and analyze actual phenomena’ with data. It uses techniques and theories 
drawn from many fields within the context of mathematics, statistics, 
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that "everything about science is changing because of the impact of 
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neural network. Reinforcement learning. Theory Machine-learning venues. 
Glossary of artificial intelligence. 
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the natural intelligence displayed by humans and animals, which involves 
consciousness and emotionality. The distinction between the former and the 
latter categories is often revealed by the acronym chosen. ‘Strong’ Al is 
usually labelled as AGI (Artificial General Intelligence) while attempts to 
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agents": any device that perceives its environment and takes actions that 
maximize its chance of successfully achieving its goals. Colloquially, the term 
"artificial intelligence" is often used to describe machines (or computers) that 
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such as "learning" and "problem solving". Artificial intelligence was founded 
as an academic discipline in 1955, and in the years since has experienced 
several waves of optimism, followed by disappointment and the loss of 
funding (Known as an "Al winter"), followed by new approaches, success and 
renewed funding. 
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automatically through experience. It is seen as a part of artificial intelligence. 
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wide variety of applications, such as email filtering and computer vision, 
where it is difficult or unfeasible to develop conventional algorithms to 
perform the needed tasks. A subset of machine learning is closely related to 
computational statistics, which focuses on making predictions using 
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study, focusing on exploratory data analysis through unsupervised learning. 
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neural network. Reinforcement learning. Theory Machine-learning venues. 
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Deep learning (also Known as deep structured learning) is part of a broader 
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representation learning. Learning can be supervised, semi-supervised or 
unsupervised. Deep-learning architectures such as deep neural networks, 
deep belief networks, recurrent neural networks and convolutional neural 
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where they have produced results comparable to and in some cases 
surpassing human expert performance. Artificial neural networks (ANNs) were 
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biological systems. ANNs have various differences from biological brains. 
Specifically, neural networks tend to be static and symbolic, while the 
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Computational science, finance and 
engineering 
Social computing and human-computer 
interaction 
Software engineering 


STATISTICS 


Statistics is the discipline that concerns the collection, 
organization, analysis, interpretation, and presentation of 
data. In applying statistics to a scientific, industrial, or social 


problem, it is conventional to start with a Statistical 
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Populations can be assorted groups of people or objects such 
as “all people living in a country" or "each atom composing a 
crystal". Statistics deals with each aspect of data, including 


the planning of data collection in wording of the plan of 
overviews and experiments. 
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CATEGORIES IN STATISTICS 


° Descriptive Statistics 
e Inferential Statistics 


SVAtISTICS 


Descriptive statistics uses the data to provide 
descriptions of the population, either through numerical 
calculations or graphs or tables. 


Descriptive Statistics helps organize data and focuses on the 
characteristics of data providing parameters. 





Suppose you want to study the average height of students in a 
classroom, in descriptive statistics you would record the heights of 
all students in the class and then you would find out the 
maximum, minimum and average height of the class. 
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Inferential Statistics makes inferences and predictions about a 
jefe) OL UE-UmCoyaMoyztcr-ce Moyemrcst-Uney0)(cmeyMer-le- Mme <-sameueyanmMelcm ore) oele-laeyel 


in question. 
Inferential statistics generalizes a large data set and applies probability 
to arrive at a conclusion. It allows you to gather parameters of the 
population based on sample stats and assemble models on it. 
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AnsSWer 


So, in the event that we consider the same example of finding the 
average tallness of understudies in a class, in Inferential Statistics, you 
will take a sample set of the class, which is basically a few people from 
the whole class. You already have had grouped the class into tall, average 
and short. In this method, you basically construct a statistical model and 
expand it for the whole population in the class. 
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LINEAR ALGEBRA 


Linear algebra is central to almost all areas of mathematics. 
For instance, linear algebra is fundamental in modern 
presentations of geometry, including for characterizing basic 
objects such as lines, planes and rotations. Also, functional 
analysis, a branch of mathematical analysis, may be seen as 
basically the application of linear algebra to spaces of 
functions .Linear algebra is also utilized in most sciences and 
fields of designing, because it allows modeling many natural 
phenomena, and computing efficiently with such models. For 
nonlinear frameworks, which cannot be modeled with linear 
algebra, it is often utilized for dealing with first-order 
approximations, utilizing the fact that the differential of a 
multivariate function at a point is the linear map that best 
approximates the Junction near that point. 
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INTRODUCTION 


In the event that Data Science was Batman, Linear Algebra 
would be Robin. This faithful sidekick is often ignored. Be that 
as it may, in reality, it powers major areas of Data Science 
including the hot fields of Natural Language Processing and 
Computer Vision. 


CATEGORIZED 





° Machine learning 

° Dimensionality Reduction 

° Natural Language Processing (NLP) 
° Computer Vision 


WHY STUDY LINEAR 
ALGEBRA? 


You need to diminish the elements of your information 
utilizing Principal Component Analysis (PCA). How might you 
choose what number of Principal Components to protect on 
the off chance that you didn't have a clue what it would mean 
for your information? Obviously, you need to know the 
mechanics of the calculation to settle on this choice. 





LINEAR ALGEBRA IN 
MACHINE LEARNING 


Loss functions 
Regularization 
Covariance Matrix 
Support Vector Machine Classification 
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YYou ought to be pretty acquainted with how a model, say a Linear 
Regression model, suits a given records: 

You begin with a few arbitrary prediction characteristic 

(a linear characteristic for a Linear Regression Model) 

Use it at the unbiased functions of the records to expect the output 
Calculate how distant the anticipated output is from the real output 


Use those calculated values to optimize your prediction 
characteristic the use of a few method like Gradient Descent 

L1 Norm: Also referred to as the Manhattan Distance or Taxicab 
Norm. The Li Norm is the gap you'll journey in case you went from 
the foundation to the vector if the handiest approved instructions 
are parallel to the axes of the space. In this 2D space, you may attain 
the vector (three, four) via way of means of travelling three gadgets 
alongside the x-axis after which four gadgets parallel to the y-axis (as 
shown). Or you may journey four gadgets alongside the y-axis first 
after which three gadgets parallel to the x-axis. In both case, you 
may journey a complete of seven gadgets. 


L2 Norm: Also referred to as the Euclidean Distance. L2 Norm is the 
shortest distance of the vector from the origin. This distance is 
calculated the use of the Pythagoras Theorem (I can see the vintage 
math standards flickering on to your mind!). It is the rectangular root 
of (372 + 4%2), that's same to 5.But how is the norm used to discover 
the distinction among the anticipated values and the predicted 
values? Let’s say the anticipated values are saved in a vector P and 
the predicted values are saved in a vector E. Then P-E is the 
distinction vector. And the norm of P-E is the overall loss for the 
prediction. 


Tit NORM 


Manhattan Distance or L1 Norm 


L1 Norm of vector V = (1, V9,..., Vy) 
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Lea NORM 


Euclidean Distance or L2 Norm 


L2 Norm of vector V = (14, V9, ..,Vp) 
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REGUIFARUAAIHIOIN 


Regularization is a completely critical idea in information science. It’s a 
way we use to save you fashions from overfitting. Regularization is truly 
some other utility of the Norm. A version is stated to overfit while it suits 
the schooling information too properly. Such a version does now no 
longer carry out properly with new information as it has discovered even 
the noise withinside the schooling information. It will now no longer be 
capable of generalize on information that it has now no longer visible 
before. 

Regularization penalizes overly complicated fashions through including 
the norm of the burden vector to the price characteristic. Since we need 
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norm. This reasons unrequired additives of the burden vector to lessen to 
O and forestalls the prediction characteristic from being overly 
complicated. 
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investigation. We need consider the connection ee Sets of 
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connections between two constant factors. Covariance demonstrates the 
course of the straight connection between the factors. A positive 
covariance demonstrates that an expansion or lessening in one variable is 
joined by the equivalent in another. A negative covariance demonstrates 
that an increment or reduction in one is joined by the inverse in the 
other. 

Then again, relationship is the normalized estimation of Covariance. A 
connection esteem discloses to us both the strength and course of the 
Straight relationship and has the reach from - 1 to 1.Now, you may be 
imagining that this is an idea of Statistics and not Linear Algebra. Indeed, 
recollect that I disclosed to you Linear Algebra is all-inescapable? 
Utilizing the ideas of render and network augmentation in Linear Algebra, 
we have a really flawless articulation for the covariance framework: 





Here, X is the standardized data matrix containing all numerical features. 


Strong positive Moderate positive No correlation 
correlation correlation 


Nioderate negative Strong negative Curvilinear 
correlation correlation relationship 
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Ok indeed, uphold vector machines. Quite possibly the most well- 
known arrangement calculations that routinely creates amazing 
outcomes. It is a use of the idea of Vector Spaces in Linear Algebra. 
Backing Vector Machine, or SVM, is a discriminative classifier that 
works by finding a choice surface. It is a regulated AI calculation. In this 
calculation, we plot every information thing as a point in a n- 
dimensional space (where n is the quantity of highlights you have) with 
the estimation of each component being the estimation of a specific 
organize. At that point, we perform arrangement by finding the 
hyperplane that separates the two classes very well for example with 
the most extreme edge, which is C is this case. 





e A hyperplane is a subspace whose measurements are one not as much 
as its comparing vector space, so it would be a straight line for a 2D 
vector space, a 2D plane for a 3D vector space, etc. Again Vector Norm 
FSULO DNV ASLO MRO MN ONeAUS MoMA O BCom oLOreXon 

e Our intuition says that the decision surface has to be a circle or an 
ellipse, right? But how do you find it? Here, the concept of Kernel 
Transformations comes into play. The idea of transformation from one 
Space to another is very common in Linear Algebra. 





Let’s introduce a variable z = x2 + y~2. This is how the data looks if we 
plot it along the z and x-axes: 

Now, this is clearly linearly separable by a line z = a, where a is some 
positive constant. On transforming back to the original space, we get 
x2 + y~2 =a as the decision surface, which is a circle! 


LINEAR ALGEBRA IN 
DIMENSIONALITY 
REDUCTION 


e Principal Component Analysis (PCA) 
e Singular Value Decomposition (SVD) 


Dimensionality decrease, or measurement decrease, is the change of 
information from a high-dimensional space into a low-dimensional 
Space with the goal that the low-dimensional portrayal holds some 
important properties of the first information, preferably near its inborn 
measurement. Working in high-dimensional spaces can be unwanted 
for some reasons; crude information are frequently scanty as an 
outcome of the scourge of dimensionality, and examining the 
information is typically computationally immovable. Dimensionality 
decrease is regular in fields that manage huge quantities of perceptions 
and additionally huge quantities of factors, for example, signal handling, 
discourse acknowledgment, neuroinformatics, and bioinformatics. 
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Principal Component Analysis, or PCA, is an unsupervised dimensionality 
reduction technique. PCA finds the directions of maximum variance and 
projects the data along them to reduce the dimensions. 





Figenvectors for a square matrix are special non-zero vectors whose 
direction does not change even after applying linear transformation 
(which means multiplying) with the matrix. They are shown as the 
red-colored vectors in the figure below: 








Bano 


As I would like to think, Singular Value Decomposition (SVD) is misjudged 
and not talked about enough. It is an astounding Lahde i of grid 
deterioration with different applications. I will attempt to cover a couple 
of them in a future article.For presently, let us talk about SVD in 
Dimensionality Reduction. In particular, this is known as Truncated SVD. 


e We start with the large m x n numerical data matrix A, where m is the 
number of rows and n is the number of features 
e Decompose it into 3 matrices as shown here: 


Right 
singular 
vectors 





e Choose k singular values based on the diagonal matrix and truncate 
(trim) the 3 matrices accordingly: 








e Finally, multiply the truncated matrices to obtain the transformed 
matrix A_k. It has the dimensions m x k. So, it has k features with k < 
n 


LINEAR ALGEBRA IN 
NATURAL LANGUAGE 
MURAD] 


° Word Embeddings 


° Latent Semantic Analysis 


Common language handling (NLP) is a subfield of etymology, software 
engineering, and man-made consciousness worried about’ the 
communications among PCs and human language, specifically how to 
program PCs to measure and break down a lot of normal language 
information. The outcome is a PC able to do “understanding” the 
substance of archives, including the context oriented subtleties of the 
language inside them. The innovation can then precisely remove data and 
bits of knowledge contained in the reports just as arrange and sort out 
the actual archives. 
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MO MDEEMIBEDIDIINGS 


AI calculations can't work with crude printed information. 
We need to change over the content into some mathematical and 
measurable highlights to make model sources of info. There are 
numerous ways for designing highlights from text information, for 
example. 

e Meta attributes of a text, like word count, special character count, 


etc. 


e NLP attributes of text using Parts-of-Speech tags and Grammar 
Relations like the number of proper nouns 
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Word Embeddings is a method of addressing words as low dimensional 
vectors of numbers while saving their setting in the archive. These 
portrayals are gotten via preparing diverse neural organizations on a lot 
of text which is known as a corpus. They additionally help in examining 
syntactic closeness among words: 
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Verb tense 


‘world’ 

.12768/426e-63 
»86/192701e-62 
.68515986e-61 
48623 /124e-61 
.34/75084/e-63 
.4/7760404e-61 
~87696911e-62 
22999954e-b1 
.3262/6/91e-61 
.16186301e-61 
44453 
.f 76/4933e-62 
~65784163e-61 
.514339561e-61 
.449865e3e-82 
87 /21496e-62 
./ 164648 /e-61 
.67866136e-61 
.2416235/e-62 
. 8527629 /e-62 
.6749/639e-62 
.5660527/61e-62 
.21666196e-61 
J~£86559/228e-61 
. 32396102 /e-62 


356e-01 


32746e-61 
?945e-61 
)f74e63e-62 


sal Po fs 


-95275599e-01 
.36831564e-62 
.14892551e-61 
.615196627e-61 
5244 /995e-82 
.32355880e-601 
.461265642e-61 
.66246/o4e-64 
.686 
.12621834e-61 
.65163383e-01 
.55988641e-61 
.65116/65e-62 


39362e-02 


le wh he A oe 


i [asl i te A 
Pt eee eee 


See oe 


| 


fe 


Spain 
Italy 


Germany 


Turkey 


Russia 
Canada 
Japan 


Vietnam 
China 





Madrid 
Rome 


Berlin 


Ankara 


Moscow 
Ottawa 
Tokyo 


Hanoi 
Beijing 


Country-Capital 


98/7 /6298e- 82 
.58239261e-61 
.153092732e-B1 
.631/735/6e-01 
.59118438e-601 
.869/79588e-01 
. 54355249 1e-61 
.8516665se-02 
. 354960653e-02 
-62196514e-601 
.290616/8e-01 
»53648259e-82 
.o48027 1816-62 
./2/9685/e- 62 
. 2196405e-82 
. 23960805e-01 
.1990306/8e-01 
.6//260/86e-61 
.629276416e-61 
.1186616/e-62 
.4160627156e-62 
.2#261853/0e-B1 
.60487198e-82 
. /461616003e-02 
.91777591e-82 | 


Word2Vec and GloVe are two famous models to make Word Embeddings. 
I prepared my model on the Shakespeare corpus after some light 
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the word ‘world’: 


Pretty cool! But what’s even more awesome is the below plot I obtained 
for the vocabulary. Observe that syntactically similar words are closer 
together. I have highlighted a few such clusters of words. The results are 
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“ANALYSIS (LSI) 


Inert semantic investigation (LSA) is a method in characteristic language 
preparing, specifically fistributional semantics, of dissecting connections 
between a bunch of reports and the terms they contain by creating a 
bunch of ideas identified with the records and terms. LSA accepts that 
words that are close in importance will happen in comparable bits of text 
(the distributional theory). A lattice containing word checks per record 
(ines address special words and segments address each archive) is built 
from a huge piece of text and a numerical procedure called solitary 
worth deterioration (SVD) is utilized to diminish the quantity of lines 
while protecting the comparability structure among segments. Archives 
are then hear pee by taking the cosine of the point between the two 
vectors (or the speck item between the normalizations of the two 
vectors) formed by any two segments. Qualities near 1 address very much 
like records while values near O address different archives 


Topic Modeling is an unsupervised technique to find topics across 
various text documents. These topics are nothing but clusters of related 
words. Each document can have multiple topics. The topic model outputs 
the various topics, their distributions in each document, and the 
frequency of different words it contains. Latent Semantic Analysis (LSA), 
or Latent Semantic Indexing, is one of the techniques of Topic Modeling. 
It is another application of Singular Value Decomposition. Latent means 
‘hidden’. True to its name, LSA attempts to capture the hidden themes or 
topics from the documents by leveraging the context around the words. 


Topic Model | 
Distribution of topics 
Collection of Documents , 


Topic 
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LINEAR ALGEBRA IN 
COMPUTER VISION 


° Image Representation as Tensors 
° Convolution and AQIS MAROC ASYHING 


nL ican RESEND AON 


How do you account for Ils vision’ in SOlh Vision? Obviously, a 
computer does not process images as humans do. Like I mentioned 
earlier, machine learning algorithms need numerical features to work 
with. 

A digital image is made up of small indivisible units called pixels. 


Consider the figure below: 














This grayscale image of the digit zero is made of 8 x 8 = 64 pixels. Each 
pixel has a value in the range O to 255. A value of O represents a black 
pixel and 255 represents a white pixel. Conveniently, an m x n grayscale 
image 

can be represented as a 2D matrix with m rows and n columns with the 
cells containing the respective pixel values: 











7 awa 7 

re baal "haat fat sa 

oo 

0.0) 

bts “Ha 

“om “om 

Lal es “tea at sa 

“ “7 | 

ae. “tea at sa 
=, 

“om 7 om 

a os “Tal foil, ll on Pat a “Hal 
Lan x 


7 or 
UW 
es Ea 





__ 

—— = 

[170.0 | 0.0) 0.0) 
Bat “hl ols Da 

a” 


But what about a colored image? A colored image is generally stored in 
the RGB system. Each image can be thought of as being represented by 
three 2D matrices, one for each R, G and B channel. A pixel value of O in 
the R channel represents AAKO) intensity of the Red color and of 255 
represents the full intensity of the Red color. 

Each pixel value is then a combination of the corresponding values in the 
three channels: 





MLC PROCESSING 


2D Convolution is a very important operation in image processing. It 
consists of the below steps:Start with a small matrix of weights, called a 
kernel or a filterSlide this kernel on the 2D input data, performing 
element-wise multiplicationAdd the obtained values and put the sum ina 
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The function can seem a bit complex but it’s widely used for performing 
various image processing operations like sharpening and blurring the 
images and edge detection. We just need to know the right kernel for the 
task we are trying to accomplish. Here are a few kernels you can use: 
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EXPLORATORY DATA 
ANALYSIS 


In measurements, exploratory information examination is a way 
to deal with investigating informational collections to sum up 
their principle qualities, regularly with visual techniques. A 
measurable model can be utilized or not, however fundamentally 
EDA 1s for seeing what the information can advise us past the 
conventional displaying or theory testing task. Exploratory 
information examination was elevated by John Tukey to urge 
analysts to investigate the information, and potentially detail 
theories that could prompt new information assortment and trials. 
EDA is unique in relation to starting information examination 
(IDA), which centers all the more barely around checking 
presumptions needed for model fitting and speculation testing, and 
dealing with missing qualities and making changes of factors on a 
case bv case basis. EDA includes IDA, 
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Identification of variables and data types 
Analyzing the basic metrics 
Non-Graphical Univariate Analysis 
Graphical Univariate Analysis 
Bivariate Analysis 
Variable transformations 
Missing value treatment 
Outlier treatment 
Correlation Analysis 
Dimensionality Reduction 





Univariate Analysis 


Missing Value 
Data 
Preprocess 


Outliers 


Colinearity 


Data 
Visualization 





DATA MODELING 
SUPERVISED LEARNING 


Regulated learning is the AI assignment of learning a capacity 
that maps a contribution to a yield dependent on model 
information yield sets. It gathers a capacity from named 
preparing information comprising of a bunch of preparing 
models. 


SOP EI ps CHUN 


e Supervised learning is a machine learning method in which models 
are trained using labeled data. In supervised learning, models need to 
find the mapping function to map the input variable (X) with the 


output variable (Y). 
Y=f(X) 


e Supervised learning needs supervision to train the model, which is 
similar to as a student learns things in the presence of a teacher. 
Supervised learning can be used for two types of problems: 
Classification and Regression. 
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Assume we have a picture of various kinds of organic products. The 
errand of our managed learning model is to distinguish the products of 
the soil them as needs be. So to recognize the picture in managed 
learning, we will give the information just as yield for that, which implies 
we will prepare the model by the shape, size, shading, and taste of each 
natural product. When the preparation is finished, we will test the model 
by giving the new arrangement of natural product. The model will 
distinguish the leafy foods the yield utilizing a reasonable calculation. 





DATA MODELING 
UNSUPERVISED LEARNING 


Unaided learning (UL) is a sort of AI that uses an informational 
collection with no prior marks with a base of human 
management, frequently to look for beforehand undetected 
examples. As opposed to administered learning (SL) that 
generally utilizes human-named information, unaided learning, 
otherwise called self-association takes into consideration 
demonstrating of likelihood densities over data sources. It 
structures one of the three principle classes of AI, alongside 
managed and support learning. Semi-directed learning, a 
connected variation, utilizes regulated and solo methods. 


ALCRINESIEMANEINIIN 


e The goal of unsupervised learning is to find the structure and patterns 
from the input data. Unsupervised learning does not need any 
Supervision. Instead, it finds patterns from the data by its own. 

e Unsupervised learning can be used for two types of problems: 
Clustering and Association. 
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To comprehend the solo learning, we will utilize the model given 
previously. So not at all like directed learning, here we won't give any 
oversight to the model. We will simply give the info dataset to the model 
and permit the model to discover the examples from the information. 
With the assistance of a reasonable calculation, the model will prepare 
itself and separation the natural products into various gatherings as per 
the most comparable highlights between them. 
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SUPERVISED MACHINE: 
LARARINIING 


e Supervised learning algorithms are trained using labeled data. 

e Supervised learning model takes direct feedback to check if it 
is predicting correct output or not. 

e Supervised learning model predicts the output. 

e In supervised learning, input data is provided to the model 
along with the output. 

e The goal of supervised learning is to train the model so that it 
can predict the output when it is given new data. 

e Supervised learning needs supervision to train the model. 

e Supervised learning can be categorized in Classification and 
Regression problems. 

e Supervised learning can be used for those cases where we 
know the input as well as corresponding outputs. 

e Supervised learning model produces an accurate result. 

e Supervised learning is not close to true Artificial intelligence 
as in this, we first train the model for each data, and then only 
it can predict the correct output. 

e It includes various algorithms such as Linear Regression, 
Logistic Regression, Support Vector Machine, Multi-class 
Classification, Decision tree, Bayesian Logic, etc. 


UNSVUPERVISHED 
MACHINE: LEARNING 


Unsupervised learning algorithms are trained using unlabeled 
data. 

Unsupervised learning model does not take any feedback. 
Unsupervised learning model finds the hidden patterns in 
data. 

In unsupervised learning, only input data is provided to the 
model. 

The goal of unsupervised learning is to find the hidden 
patterns and useful insights from the unknown dataset. 
Unsupervised learning does not need any supervision to train 
the model. 

Unsupervised Learning can be classified in Clustering and 
Associations problems. 

Unsupervised learning can be used for those cases where we 
have only input data and no corresponding output data. 
Unsupervised learning model may give less accurate result as 
compared to supervised learning. 

Unsupervised learning is more close to the true Artificial 
Intelligence as it learns similarly as a child learns daily routine 
things by his experiences. 

It includes various algorithms such as Clustering, KNN, and 
Apriori algorithm. 


Supervised Learning Unsupervised Learning 
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Relapse calculations are utilized if there is a connection between 
the information variable and the yield variable. It is utilized for 
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Suaging, Market Trends, and so forth The following are some 
famous Regression calculations which go under managed 
learning:Linear Regression 


1. Regression Trees 

Za. Non-Linear Regression 
3. Bayesian Linear Regression 
4. Polynomial Regression 


e Classification algorithms are used when the output variable is 
categorical, which means there are two classes such as Yes- 
No, Male-Female, True-false, etc. 
Spam Filtering 
Random Forest 
Decision Trees 
Logistic Regression 
Support vector Machines 
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Clustering: Clustering is a method of grouping the objects into 
clusters such that objects with most similarities remains into 
a group and has less or no similarities with the objects of 
another group. Cluster analysis finds the commonalities 
between the data objects and categorizes them as per the 
presence and absence of those commonalities. 

Association: An association rule is an unsupervised learning 
method which is used for finding the relationships between 
variables in the large database. It determines the set of items 
that occurs together in the dataset. Association rule makes 
marketing strategy more effective. Such as people who buy X 
item (Suppose a bread) are also tend to purchase Y 
(Butter/Jam) item. A typical example of Association rule is 
Market Basket Analysis. 
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UNSUVUPERVISED 
LBARINING ALGORITDEFIMS 


K-means clustering 
KNN (k-nearest neighbors) 
Hierarchal clustering 
Anomaly detection 
Neural Networks 
Principle Component Analysis 
Independent Component Analysis 
Apriorism algorithm 
Singular value decomposition 


DATA MODELING 
MODEL EVALUATION 


Model Evaluation is an essential piece of the model 
advancement measure. It assists with finding the best model 
that addresses our information and how well the picked model 
will function later on. Assessing model execution with the 
information utilized for preparing isn't adequate in information 
science since it can without much of a stretch produce 
overoptimistic and overfitted models. There are two strategies 
for assessing models in information science, Hold-Out and 
Cross-Validation. To maintain a strategic distance from 
overfitting, the two strategies utilize a test set (not seen by the 
model) to assess model execution. 


CROSS-VALIDATITION 


When only a limited amount of data is available, to achieve an 
unbiased estimate of the model performance we use k-fold cross- 
validation. In k-fold cross-validation, we divide the data into k 
subsets of equal size. We build models k times, each time leaving 
out one of the subsets from training and use it as the test set. If k 
equals the sample size, this is called "leave-one-out". 


COINUTSIOINEIMIAVERUDX 


A confusion matrix shows the number of correct and incorrect 
predictions made by the classification model compared to the 
actual outcomes (target value) in the data. The matrix is NxN, 
where N is the number of target values (classes). Performance of 
such models is commonly evaluated using the data in the matrix. 
The following table displays a 2x2 confusion matrix for two 
classes (Positive and Negative). 








Z a Target 
Confusion Matrix = 7 
Positive Negative 


Positive a b Positive Predictive Value a/(a+b) 
Model 


Negative c d Negative Predictive Value | d/({c+d) 
Sensitivity Specificity 
a/(a+c) d/(b+d] 


Accuracy = (atd)/(atb+c+d) 





e Accuracy : the proportion of the total number of predictions 
that were correct. 

e Positive Predictive Value or Precision : the proportion of 
positive cases that were correctly identified. 

e Negative Predictive Value : the proportion of negative cases 
that were correctly identified. 

e Sensitivity or Recall : the proportion of actual positive cases 
which are correctly identified. 

e Specificity : the proportion of actual negative cases which are 
correctly identified. 
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a" | Target 
Contusion Matrix 


Positive Negative 


| Positive 70 2 Positive Predictive Value 0.78 
Model 


Negative 30 Negative Predictive Value 0.73 
Sensitivity Specificity 
0.7/0 0.80 


Accuracy —0./5 
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Gain or lift is a proportion of the viability of an arrangement 
model determined as the proportion between the outcomes 
acquired with and without the model. Gain and lift diagrams are 
visual guides for assessing execution of grouping models. 
Notwithstanding, rather than the disarray lattice that assesses 
models in general populace gain or lift diagram assesses model 
execution in a part of the populace. 
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The litt chart shows how much more likely we are to receive 
positive responses than if we contact a random sample of 
customers. For example, by contacting only 10% of customers 
based on the predictive model we will reach 3 times as many 
respondents, as if we use no model. 
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K-S or Kolmogorov-Smirnov chart measures performance of 
classification models. More accurately, K-S is a measure of the 
degree of separation between the positive and negative 
distributions. The K-S is 100 if the scores partition the population 
into two separate groups in which one group contains all the 
positives and the other all the negatives. On the other hand, If the 
model cannot differentiate between positives and negatives, then 
it is as if the model selects cases randomly from the population. 
The K-S would be O. In most classification models the K-S will fall 
between O and 100, and that the higher the value the better the 
model is at separating the positive from negative cases. 
Bix A FL 

The following example shows the results from a classification 
model. The model assigns a score between O-1000 to each 
positive (Target) and negative (Non-Target) outcome. 
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The ROC chart is similar to the gain or lift charts in that they 
provide a means of comparison between classification models. 
The ROC chart shows false positive rate (1-specificity) on X-axis, 
the probability of target=1 when its true value is O, against true 
positive rate (sensitivity) on Y-axis, the probability of target=1 
when its true value is 1. Ideally, the curve will climb quickly 
toward the top-left meaning the model correctly predicted the 
cases. The diagonal red line is for a random model (ROC101). 





Sensitivity 
wie fandom 
Model 
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Area under ROC curve is often used as a measure of quality of the 
classification models. A random classifier has an area under the 
curve of 0.5, while AUC for a perfect classifier is equal to 1. In 
practice, most of the classification models have an AUC between 
0.5 and 1. 

An area under the ROC curve of 0.8, for example, means that a 
randomly selected case from the group with the target equals 1 
has a score larger than that for a randomly chosen case from the 
sroup with the target equals O in 80% of the time. When a 
classifier cannot distinguish between the two groups, the area 
will be equal to 0.5 (the ROC curve will coincide with the 
diagonal). When there is a perfect separation of the two groups, 
| ooees OLOMO)'o) WE-1 0) 0) 1 0120K0) MNO OL oa ODTS10 010) 010 (0) 0lSHmNO SL oMr-DMor-MEUNOL6(o) MMOS Tow, COL@ 
curve reaches to 1 (the ROC curve will reach the 

upper left corner of the plot). 
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RMSE is a popular formula to measure the error rate of a regression 
model. However, it can only be compared between models whose 
errors are measured in the same units. 


a = actual target 


p= predicted target 
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Unlike RMSE, the relative squared error (RSE) can be compared 
between models whose errors are o onanad in he different units. 
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The mean absolute error (MAE) has the same unit as the original data, 
and it can only be delice, between models whose errors are 
measured in the same units. It is usually similar in magnitude to 
RMSE, but slightly smaller. 
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Like RSE, the relative absolute error (RAE) can be compared between 
paKexelodks whose errors are measured in the different units. 
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The coefficient of determination (R2) summarizes the explanatory 
power of the regression model vey KS oe from the 
Sums-of-squares terms. 
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sum of Squares Regression - 





R2 describes the proportion of variance of the dependent variable 
explained by the regression model. If the regression model is 
“perfect”, SSE is zero, and R2 is 1. If the regression model is a total 
failure, SSE is equal to SST, no variance is explained by regression, 
and R2 is zero. 
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The standardized residual plot is a useful visualization tool in order 
to show the residual dispersion patterns on a standardized scale. 
There are no substantial differences between the pattern for a 
Standardized residual plot and the pattern in the regular residual 
plot. The only difference is the standardized scale on the y-axis 
which allows us to easily detect potential outliers. 
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i = number of observations 
ik = number of predictors 





WEB SCRAPING 


Web Scraping is a procedure to separate a lot of information 
from a few sites. The expression "scratching" alludes to getting 
the data from another source (site pages) and saving it into a 
neighborhood record. For instance: Suppose you are dealing 
with a venture called “Telephone contrasting site," where you 
require the cost of cell phones, appraisals, and model names to 
make examinations between the distinctive cell phones. In the 
event that you gather these subtleties by checking different 
locales, it will take a lot of time. All things considered, web 
rejecting assumes a significant part where by composing a 
couple of lines of code you can get the ideal outcomes. Web 
Scrapping extricates the information from sites in the 
unstructured configuration. It assists with gathering these 
unstructured information and convert it in an organized 
form.Startups favor web rejecting on the grounds that it is a 
modest and viable approach to get a lot of information with no 
organization with the information selling organization. 





Webpages Web Scraping Structured Data 
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decisions. Price monitoring using web scrapped data gives the 
ability to the companies to know the market condition and facilitate 
dynamic pricing. It ensures the companies they always outrank 
others. 
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Web Scrapping is perfectly appropriate for market trend analysis. 

It is gaining insights into a particular market. The large 
organization requires a great deal of data, and web scrapping 
provides the data with a guaranteed level of reliability and accuracy. 
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Many companies use personals e-mail data for email marketing. 
They can target the specific audience for their marketing. 
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Web Scrapping plays an essential role in extracting data from social 
media websites such as Twitter, Facebook, and Instagram, to find 
the trending topics. 
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The web scrapping consists of two parts: a web crawler and a web 
Scraper. In simple words, the web crawler is a horse, and the 
Scrapper is the chariot. The crawler leads the scrapper and extracts 
the requested data. Let's understand about these two components 
of web scrapping: 

° The crawler 

6 The scrapper 


"TELE: CRAWLER 


A web crawler is generally called a "spider." It is an artificial 
intelligence technology that browses the internet to index and 
searches for the content by given links. It searches for the relevant 
information asked by the programmer. 





‘THE SCRAPPER 


A web scraper is a dedicated tool that is designed to extract the data 
from several websites quickly and effectively. Web scrappers vary 
widely in design and complexity, depending on the projects. 
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REGULAR EXPRESSION 


A regular expression (shortened as regex or regexp also 

referred to as rational expression) is a sequence of characters 
that define a search pattern. Usually such patterns are used by 
string-searching algorithms for "find" or "find and replace" 
operations on strings, or for input validation. It is a technique 
developed in theoretical computer science and formal lanquage 
theory. An advanced regular expression that matches any 


numeral is [+-]?(\d+(\.\d+)?]\.\d+)([eE][+-]?\ d+)?. 
FORMAL DEEINIUDION 


Regular expressions consist of constants, which denote sets of 
strings, and operator symbols, which denote operations over 
these sets. The following definition is standard, and found as such 
in most textbooks on formal language theory. Given a finite 
alphabet 2, the following constants are defined as regular 
4 8) erst) (0) 81s 
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e alb* denotes j¢, "a", "b", "bb", "bbb", ...! 

e (alb)* denotes the set of all strings with no symbols other than 
"a" and "b", including the empty string: {¢, "a", "b", "aa", "ab", 
"Da", "bb", "aaa", ...} 

e ab*(cle) denotes the set of strings starting with "a", then zero 
or more "b's and finally optionally a "c": {"a", "ac", "ab", "abc", 
"abb", "abbc", ...} 

e (O|(1(01*0)*1))* denotes the set of binary numbers that are 
multiples of 3: { €, "0", "OO", "11", "OOO", "O11", "110", "OO00"," 
"0011", "0110", "1001", "1100", "1111", "OOO00", ... } 












DATA RESHAPING 


A data frame is a table or a two-dimensional array-like 

structure in which each column contains values of one variable 

and each row contains one set of values from each column. 

Following are the characteristics of a data frame. 

° The column names should be non-empty. 

° The row names should be unique. 

e The data stored in a data frame can be of numeric, factor or 
character type. 

Each column should contain same number of data items. 
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1. Transpose a Matrix 


2. Joining rows and columns 
in a Data Frame 


3. Melting & Casting 


4. Merging Data Frame 


Original 
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DATA MINING 


The process of extracting information to identify patterns, 
trends, and useful data that would allow the business to take the 
data-driven decision from huge sets of data is called Data 
Mining. ZLaDiD 

Data mining is the act of automatically searching for large stores 
of information to find trends and patterns that go beyond simple 
analysis procedures. Data mining utilizes complex mathematical 
algorithms for data segments and evaluates the probability of 
future events. Data Mining is also called Knowledge Discovery of 
Data (KDD). Data Mining is a process used by organizations to 
extract specific data from huge databases to solve business 
problems. It primarily turns raw data into useful information. 
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APPLICATIONS 


Praud 
Detection 


Data Mining — 
Applications 


Lie Detection 


Manufacturing 


Financial Engineering 


Banking 
Education 


Data Privacy | | Distributed 
and Security | Data 


Challenges in 
Data Mining 


Complex 
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PANDAS 





Pandas is defined as an open-source library that provides high- 
performance data manipulation in Python. The name of Pandas 
is derived from the word Panel Data, which means an 
Econometrics from Multidimensional data. It is used for data 
analysis in Python and developed by Wes McKinney 1n 2008. 


Data analysis requires lots of processing, such as 
restructuring, cleaning or merging, etc. There are different 
tools are available for fast data processing, such as Numpy, 
Scipy, Cython, and Panda. But we prefer Pandas because 
working with Pandas is fast, simple and more expressive than 
(0) d at eyam ele) Isp 

Pandas is built on top of the Numpy package, means Numpy is 
required for operating the Pandas. 

Before Pandas, Python was capable for data preparation, but it 
only provided limited support for data analysis. So, Pandas 
came into the picture and enhanced the capabilities of data 
analysis. It can perform five significant steps required for 
processing and analysis of data irrespective of the origin of 
the data, i.e., load, manipulate, prepare, model, and analyze. 


BEINEELS OF PANDAS 


Data Representation: It represents the data in a form that is 
Suited for data analysis through its DataFrame and Series. 
Clear code: The clear API of the Pandas allows you to focus on 
the core part of the code. So, it provides clear and concise 
code for the user. 


GRAMMAR OF DATA (NLP 


NLP stands for Natural Language Processing, which is a part of 
Computer Science, Human language, and Artificial Intelligence. 
It is the technology that is used by machines to understand, 
analyse, manipulate, and interpret human's languages. It helps 
developers to organize knowledge for performing tasks such as 
translation, automatic summarization, Named _ Entity 
Recognition (NER), speech recognition, relationship extraction, 
and topic segmentation. 


Computer 
Science 


mm ol a 
i. g wy i 
BA sede nae = 65 = 
| ' a —h 
: on ai | _ 
a 1 - — ae! 


Intelligence Language 





e 1948 - In the Year 1948, the first recognisable NLP application 
was introduced in Birkbeck College, London. 

e 1950s - In the Year 1950s, there was a conflicting view 
between linguistics and computer science. Now, Chomsky 
developed his first book syntactic structures and claimed that 
language is generative in nature. 


Augmented Transition Networks is a finite state machine that 
is capable of recognizing regular languages. 

Case Grammar was developed by Linguist Charles J. Fillmore 
in the year 1968. Case Grammar uses languages such as 
English to express the relationship between nouns and verbs 
by using the preposition. 

SHRDLU is a program written by Terry Winograd in 1968-70. 
It helps users to communicate with the computer and moving 
objects. It can handle instructions such as "pick up the green 
boll" and also answer the questions like "What is inside the 
black box." The main importance of SHRDLU is that it shows 
those syntax, semantics, and reasoning about the world that 
can be combined to produce a system that understands a 
natural language. 

LUNAR is the classic example of a Natural Language database 
interface system that is used ATNs and Woods’ Procedural 
Semantics. It was capable of translating elaborate natural 
language expressions into database queries and handle 78% of 
requests without errors. 

In the beginning of the year 1990s, NLP started growing faster 
and achieved good process accuracy, especially in English 
Grammar. In 1990 also, an electronic text introduced, which 
provided a good resource for training and examining natural 
language programs. Other factors may include the availability 
of computers with fast CPUs and more memory. The major 
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processing was the Internet. 

Now, modern NLP consists of various applications, like speech 
recognition, machine translation, and machine text reading. 
When we combine all these applications then it allows the 
artificial intelligence to gain knowledge of the world. Let's 
consider the example of AMAZON ALEXA, using this robot you 
can ask the question to Alexa, and it will reply to you. 
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NATTDURAL LANGUAGE: 
UNDERSTANDING CWNLV) 
Natural Language Understanding (NLU) helps the machine to 
understand and analyses human language by extracting the 
metadata from content such as concepts, entities, keywords, 

emotion, relations, and semantic roles. 
NLU mainly used in Business applications to understand the 
customer's problem in both spoken and written language. 
NLU involves the following tasks. 
e Itis used to map the given input into useful representation. 
e It is used to analyze different aspects of the language. 


NATDURAL LANGUAGE 
GENERATION CNLG) 
Natural Language Generation (NLG) acts as a translator that 
converts the computerized data into natural language 
representation. It mainly involves Text planning, Sentence 
planning, and Text Realization. 
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QOUBS TION ANSWERING 


Question Answering 
focuses on building 
systems that 
automatically answer the 
questions asked by 
humans in a natural 
language. 





SPAM DEVTHRCTION 


Spam detection is used to 
detect unwanted e-mails 
setting to a user's inbox. 





S BIN'DIIM ENT ANALYSIS 


Sentiment Analysis is also known as opinion mining. It is used on 
the web to analyse the attitude, behaviour, and emotional state 
of the sender. This application is implemented through a 
combination of NLP (Natural Language Processing) and statistics 
by assigning the values to the text (positive, negative, or natural), 
identify the mood of the context (happy, sad, angry, etc.) 





MACHINE "RAN SDLATTIOIN 


Machine translation is used to translate text or speech from one 
natural language to another natural language. 
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Microsoft Corporation 
provides word processor 
software like MS-word, 
PowerPoint for the spelling 
correction. 
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RECOGNITION 


Speech recognition is used for converting spoken words into 
text. It is used in applications, such as mobile, home automation, 
video recovery, dictating to Microsoft Word, voice biometrics, 
voice user interface, and so on. 
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Speech recognition is used for converting spoken words into 
text. It is used in applications, such as mobile, home automation, 
video recovery, dictating to Microsoft Word, voice biometrics, 
voice user interface, and so on. 


TIN FORM ATTIOIN 
BX TRAC'IION 


Information extraction is one of the most important applications 
of NLP. It is used for extracting structured information from 
unstructured or semi-structured machine-readable documents. 








STATISTICAL MODEL 


A statistical model is a mathematical model that embodies a 
set of statistical assumptions concerning the generation 

of sample data (and similar data from a larger population). 

A statistical model represents, often in considerably idealized 
form, the data-generating process. 


STATISTICAL, 
This Statistics preparation material will cover the important 
concepts of Statistics syllabus. It contains chapters discussing all 
the basic concepts of Statistics with suitable examples. 
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REGRESSION 


e Marine regression, coastal advance due to falling sea level, 
the opposite of marine transgression 

e Regression (medicine), a characteristic of diseases to 
express lighter symptoms or less extent (mainly for 
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e Regression (psychology), a defensive reaction to some 
unaccepted impulses 


REGRESSION ANALYSIS 
Regression analysis, a statistical technique for estimating the 
relationships among variables. There are several types of 
regression 


° Linear regression 

° simple linear regression 
© Logistic regression 

° Nonlinear regression 

° Nonparametric regression 
° Robust regression 

° Stepwise regression 


Regression toward the mean, a common Statistical phenomenon 


STANDARD DEVIATTIOIN 
Standard deviation is another measure of dispersion. It means: 
on "average", how far is the data from the mean. The term 
"average" is in quotes because it is not a true average, but rather 
Squaring the distances and then later taking the square root. This 
is done for mathematical reasons and gives an approximation of 
the average distance, but it is not exact. (Recall that the exact 
average distance is calculated using absolute values, but it is not 
possible to take a derivative of an absolute value so they are 
rarely used). 





COVARIANCE 


Covariance is a measure of correlation between two random 
variables. However, it's a raw measure. That is, the range can be 
from a very large negative number to a very large positive 
number versus negative one to positive one. 
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Much like covariance, correlation coefficient measures the 
correlation between two random variables. However, it is a 
standardized covariance so that it is easy to interpret (think of it 
as the covariance in percentage terms). It ranges from negative 
one (perfect negative relationship) to positive one (positive 
perfect relationship). The closer to zero the coefficient is, the 
weaker the relationship between the two variables. The further 
from zero, the stronger the relationship. 


cov(x, y) 
sd(x)-sd(y) 
REGRESSION SLOF EH 


The regression coefficient or slope coefficient represents the 
change in the dependent variable that we expect to see as a 
result of a change in the policy or treatment. This information is 
captured by the formula for a slope, which is literally the change 
in Y associated with the change in X. The way that we measure 
change is through covariance and variance. 


cov(x, y) = 





cor(x, y)= 





Cov(x,y) 
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Confidence intervals represent the size of effect for each 
program. As can be seen, program 1 isn't statistically significant 
as the interval captures zero. Program 2 is statistically significant 
but has a negative impact. If you had to pick a program, Program 
1 is a better bet because the range of program slopes includes 
many positive values. It also includes zero and negative values, so 


there is a chance the program might have no effect or a negative 
effect. However, Program 2 will always have a negative impact so 
it is a bad option. Neither choice is great, but I would take a 
probability of a positive impact over a certainty of a negative 
impact. 





BIAS 


Statistical bias is a feature of a statistical technique or of 

its results whereby the expected value of the results differs 
from the true underlying quantitative parameter being 
estimated. The bias of an estimator of a parameter should not 
be confused with its degree of precision as the degree of 
precision is a measure of the sampling error. 
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High bias “Just right” High variance 
(underfit) (overfit) 





PARTITIONING ‘'DEIE: 
VARIANCE OF Y 
Dividing the variance of Y into sections helps to understand the 
mechanics of regression analysis. Recall the way in which the 
variance is calculated. In this section, scatterplots will serve as an 
illustrative aid. 
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In regression, we use independent variables to attempt to 
understand why we see different values for the dependent 
variable in the data. When X and Y covary, we say that the 
variation in X "explains" part of the variation in Y. Thus, the 
variance of Y can be divided (or partitioned) into two sections: 
the explained portion and the unexplained portion. 





USING VENN DIAGRAMS 
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As noted above and in the Visual Regression section, adding an 
independent variable divides the total variance into two parts. 
Below, the Ballentine on the left illustrates that X explains the 
portion of the variance of Y that is labeled B. Section A is the 
portion of the variance of Y that is left unexplained. This 
Ballentine corresponds with the scatterplot on the right. 





PARTITION BD 
REGRESSION 

Suppose we're interested in an outcome (our dependent 
variable, Y) and a policy-relevant measure (an independent 
variable, X1), but there's also an important control variable 
(another independent variable, X2) which is related to 
both of them. This relationship is pictured in the 
Ballentine below (X2 is shaded for clarity). 


since we want to examine the 
relationship between Y and X1, 
we need to remove the 
contaminated covariance with 
X2 (that is, the overlapping 
areas with the shaded X2). To do 
this, we first need bivariate 
regressions for both Y on X2 
and also X1 on X2. When we 
regress Y on X2, the residual 
(also known as the unexplained 
portion or e) is the leftover 
variance of Y after the shaded 
overlap is removed. 
ey 
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OMT" RD VARIABLE BIAS 


As detailed above, omitting a variable 
from a regression model can bias the 
slope estimates for the variables that 
are included in the model. Bias only 
occurs when the omitted variable is 
correlated with both the dependent 
variable and one of the included 
independent variables. 
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Heterogeneity bias (also sometimes called group difference bias) 
occurs when there are natural group structures in the data and 
there are innate differences in the groups that are correlated 
with the study variable. A group can be many individuals in one 
or more time-periods. A “group” can also be one individual 
measured over time. In this example (a medical blood pressure 
study), a regression shows an upward trend in blood pressure as 
dosage increases. 


But when the model accounts for group differences, we see that 
the relationship is actually reversed (that is, we now see a 
downward trend). 


Essentially, this is just a special case of omitted variable bias in 
which the omitted variable is groups. 
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ShLEC TION BIAS 
Selection bias arises when individuals enter /exit groups in non- 
random ways. For example, if you're evaluating the effect of an 
abstinence-only sex ed program on teen pregnancy, if individuals 
choose whether to participate the results are going to be 
different from a scenario in which individuals are randomly 
assigned to the participating and non-participating groups. 
Essentially, this is just another special case of omitted variable 
bias in which the omitted variable is propensity (to participate). 
Selection In versus Selection Out 
° if propensity to leave a study is correlated with study 
variables then attrition creates bias 
e if attrition is random (uncorrelated) then it is not a problem 
e see slides for micro-finance example of only poor leaving the 
program and producing artificial program effect 


MULIDPICOLIIN RARITY 


As discussed in Visual Regression, 
multicollinearity occurs when the 
independent variables in a regression 
model are very strongly correlated 
with each other. This makes it 
difficult to tell the independent 
effects of those variables and also 
inflates the standard errors of each 
slope. Below, we see that he higher 
the multicollinearity, the smaller area 

B will be, and the larger the standard J*@] 
errors will be. When the standard 
errors are larger the confidence 
intervals are bigger and it is less 
likely that the slope will be 
Statistically significant. 





MRiASURE MENT BRRORR 


As discussed in Visual Regression, measurement error can bias 
the slope estimates and change the standard errors in 
regression.Remember that measurement error is always random 
error, not systematic error. Systematic error always pushes the 
results in the same direction. An example of systematic error 
would be if you're measuring the weight of subjects with a 
miscalibrated scale that adds five pounds to each reading. In this 
case, the mean will differ but the variance will not; the regression 
Slope will be identical but the intercept will differ. Thus, the 
slope is not biased by the systematic measurement error. 

6 In the Dependent Variable 

6 In the Independent Variable 


In the Dependent Variable 


In the Dependent Variable: Measurement error in_ the 
dependent variable causes additional variance in Y. Introducing 
measurement error in the dependent variable will inflate the 
standard error but the slope estimate will be identical. The 
inflated standard error will, in turn, make it less likely that the 
slope will be statistically significant. 


Measurement Error 





In the Independent Variable 


In the Independent Variable: Measurement error in the 
independent variable causes additional variance in X. 
Introducing measurement error in the independent variable will 
cause both shrinks the slope estimate (called attenuation) and 
reduces the standard error. Attenuation always pushes the slope 
towards zero, no matter if the relationship is positive or negative. 





Measurement Error 


Note: 
B1=B2 


MISSPEHCIMICATION BIAS 


BYEETSHN Ore) 0M OL cM DOL KOXORULGLoL6 MN DMN \CoMm ULsCout-1 0 MU DOT-10)0)K0)0)ur-1KompKO)00NKO) MEU OTe 
proper regression model for the variables under analysis. This 
can be illustrated by Anscombe's quartet, a group of four very 
different datasets that have some identical statistical properties 
(mean, variance, correlation, and regression results). 


7.98 
oo] 
oe fe 
2.96 
7.24 
4.26 
10.84 
4.62 


2.66 


1 
2 
3 
4 
3 
6 
f 
o 
9 


= = : —— ra 
co 8 pw Oe we oo Re 


= 
ee & 


Mean Fad 
Varlance 1: 4.12 


Correlation 0.816 


Repression y=3+4+0.5x 





SIMULITAN BLT yY 


Simultaneity occurs when the causal structure forms a feedback 
loop. When this is the case, it is very difficult to separate out the 
independent effects. Such a causal model is pictured below. 








CLASSIFICATION 
ALGORITHM 


The Classification algorithm is a Supervised Learning 
technique that is used to identify the category of new 
observations on the basis of training data. In Classification, a 
program learns from the given dataset or observations and 
then classifies new observation into a number of classes or 
groups. 





y=i(x), where y = categorical output 
Classification algorithms can be better understood using the 
below diagram. In the below diagram, there are two classes, class 
A and Class B. These classes have features that are similar to 
each other and dissimilar to other classes. 





Os EN Or en i Oa 
PROBLEMS 
e Lazy Learners: Lazy Learner firstly stores the training dataset 
and wait until it receives the test dataset. In Lazy learner 
case, classification is done on the basis of the most related 
data stored in the training dataset. It takes less time in 
training but more time for predictions. 
Example: K-NN algorithm, Case-based reasoning. 

° Eager Learners: Eager Learners develop a classification 
model based on a training dataset before receiving a test 
dataset. Opposite to Lazy learners, Eager learners take 
less time in training and more time in prediction. 
Example: Decision Trees, Naive Bayes, ANN. 
TYPHS OF ML 
CLASSIFICATION 


A LGORDTELIMS 
LINEAR MODELS 
e Logistic Regression 
° Support Vector Machines 
NON-LINEAR MODELS 
° K-Nearest Neighbours 
8 Kernel SVM 
® INFN Aoml Bye hors 
° Decision Tree Classification 
° Random Forest Classification 


Regression Vs Classification 





K-NEAREST NEIGHBOR(ANN 


In statistics, the k-nearest neighbors algorithm (k-NN) is a 
non-parametric machine learning method first developed by 
Evelyn Fix and Joseph Hodges in 1951, and later expanded 

by Thomas Cover. It is used for classification and regression. 
In both cases, the input consists of the k closest training 
examples in feature space. The output depends on whether 
k-NN is used for classification or regression. 
In k-NN classification, the output is a class membership. An 
object is classified by a plurality vote of its neighbors, with 
the object being assigned to the class most common among 
its k nearest neighbors (k is a positive integer, typically small). 
If k = 1, then the object is simply assigned to the class of that 
single nearest neighbor. 
In k-NN regression, the output is the property value for the 
object. This value is the average of the values of k nearest 
neighbors. 


K NEAREST NEIGHBOUR 
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K-NBAREIS'T 
NEIGHBORCKNN) 
ALGORIDTEIM FOR 

MACHINE: LEARNING 


e K-Nearest Neighbor is one of the simplest Machine Learning 
algorithms based on Supervised Learning technique. 

e K-NN algorithm assumes the similarity between the new 
case /data and available cases and put the new case into the 
category that is most similar to the available categories. 

e K-NN algorithm stores all the available data and classifies a 
new data point based on the similarity. This means when new 
data appears then it can be easily classified into a well suite 
category by using K- NN algorithm. 

e K-NN algorithm can be used for Regression as well as for 
Classification but mostly it is used for the Classification 
problems. 

e K-NN is a non-parametric algorithm, which means it does 
not make any assumption on underlying data. 

e It is also called a lazy learner algorithm because it does not 
learn from the training set immediately instead it stores the 
dataset and at the time of classification, it performs an action 
on the dataset. 

e KNN algorithm at the training phase just stores the dataset 
and when it gets new data, then it classifies that data into a 
category that is much similar to the new data. 


Ei XA IMI FL 

e Suppose, we have an image of a creature that looks similar to 
cat and dog, but we want to know either it is a cat or dog. So 
for this identification, we can use the KNN algorithm, as it 
works on a similarity measure. Our KNN model will find the 
similar features of the new data set to the cats and dogs 
images and based on the most similar features it will put it in 
either cat or dog category. 


Before K-NWN 
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Euclidean Distance between Ai and B2= | /(X2-X1)2+!¥2-Y1}" 


© After K-NIN | 


Firstly, we will choose the 
number of neighbors, so we 
will choose the k=5. 
Next, we will calculate the 
Euclidean distance between 
the data points. The 
Euclidean distance is the 
distance between two 
points, which we have 
already studied in geometry. 
By calculating the Euclidean 
distance we got the nearest 
neighbors, as three nearest 
neighbors in category A and 
two nearest neighbors in 
category B. 
As we can see the 3 nearest 
neighbors are from category 
A, hence this new data point 
must belong to category A. 








CROSS VALIDATION 


Cross-validation is a technique for validating the model 
efficiency by training it on the subset of input data and testing 
on previously unseen subset of the input data. We can also say 
that it is a technique to check how a statistical model generalizes 
to an inde he by ent dataset. 
ASIC steps of cross-validations 
° Reserve a subset of the dataset as a validation set. 
e Provide the training to the model using the training dataset. 
e Now, evaluate model performance using the validation set. If 
the model performs well with the validation set, perform the 
further step, else check for the issues. 


Methods used for Cross-Validation 


e Validation Set Approach 

6 Leave-P-out cross-validation 

e Leave one out cross-validation 
e K-fold cross-validation 

° Stratified k-fold cross-validation 
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VALIDATTION S#H'TL 
APPROACH 

e We divide our input dataset into a training set and test or 
validation set in the validation set approach. Both the subsets 
are given 50% of the dataset. But it has one of the big 
disadvantages that we are just using a 50% dataset to train 
our model, so the model may miss out to capture important 
information of the dataset. It also tends to give the 
underfitted model. 


re Se a ee OL UE he On Ol 
VALIDATION 
e In this approach, the p datasets are left out of the training 
data. It means, if there are total n data points in the 
original input dataset, then n-p data points will be used as 
the training dataset and the p data points as the validation 
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and the average error is calculated to know the 
effectiveness of the model. 


LBRAVE: ONE: OUT CROSS- 
VALIDATTIOIN 


e This method is similar to the leave-p-out cross-validation, 
but instead of p, we need to take 1 dataset out of training. It 
means, in this approach, for each learning set, only one data 
point is reserved, and the remaining dataset is used to train 
the model. This process repeats for each data point. Hence 
for n samples, we get n different training set and n test set. 

It has the following features. In this approach, the bias is 
minimum as all the data points are used. The process is executed 
for n times; hence execution time is high. This approach leads to 
high variation in testing the effectiveness of the model as we 
iteratively check against one data point. 


K-F'O LD 
CROSS-VALIDATTIIOIN 

e K-fold cross-validation approach divides the input dataset 
into K groups of samples of equal sizes. These samples are 
called folds. For each learning set, the prediction function 
uses k-1 folds, and the rest of the folds are used for the test 
set. This approach is a very popular CV approach because it is 
easy to understand, and the output is less biased than other 
methods. 

e Let's take an example of 5-folds cross-validation. So, the 
dataset is grouped into 5 folds. On Ist iteration, the first fold 
is reserved for test the model, and rest are used to train the 
model. On 2nd iteration, the second fold is used to test the 
model, and rest are used to train the model. This process will 





STRATIFIED K-FOLD 
CROSS-VALIDATTIOIN 
e This technique is similar to k-fold cross-validation with some 
little changes. This approach works on stratification concept, 
it is a process of rearranging the data to ensure that each fold 
or group is a good representative of the complete dataset. To 
deal with the bias and variance, it is one of the best 
210) 8) Key-e sl erse 
is RON Ee DO OF Bae WW ee - OD 
e This method is the simplest cross-validation technique 
among all. In this method, we need to remove a subset of the 
training data and use it to get prediction results by training it 
on the rest part of the dataset. 


COMPARISON OF 
CROSS-VALIDATTION ‘TI'O 
TRAIN/TVEST SPLIT IIN 
MAC FINE: LbEARINIIN G 

e Train/test split: The input data is divided into two parts, 
that are training set and test set on a ratio of 70:30, 80:20, 
etc. It provides a high variance, which is one of the biggest 
disadvantages. 

1.Training Data: The training data is used to train the model, 
and the dependent variable is known. 

2.Test Data: The test data is used to make the predictions from 
the model that is already trained on the training data. This 
has the same features as training data but not the part of 
that. 

e Cross-Validation dataset: It is used to overcome the 
disadvantage of train/test split by splitting the dataset into 
eroups of train /test splits, and averaging the result. It can be 
used if we want to optimize our model that has been trained 
on the training dataset for the best performance. It is more 
efficient as compared to train/test split as every observation 
is used for the training and testing both. 


Training Holdout Method 


Cross Validation 


Data Permitting: 


Training Training, Validation, Testing 
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DIMENSIONALITY 





low-dimensional space 


REDUCTION 


Dimensionality reduction, or dimension reduction, 
transformation of data from a high-dimensional space into a 
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representation retains some meaningful properties of the 





original data, ideally close to its intrinsic dimension. 


e It is commonly used in the fields that deal with high- 
dimensional data, such as speech recognition, signal 
processing, bioinformatics, etc. It can also be used for data 
visualization, noise reduction, cluster analysis, etc. 


Feature Selection 


» Missing Value Ratio 
Low Variance Filter 
8 High Correlation Filter 


») Random Forest 
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e Principal Component Analysis 


e Backward Elimination 
6 Forward Selection 

° score comparison 

° Missing Value Ratio 

° Low Variance Filter 

e High Correlation Filter 
° Random Forest 

° Factor Analysis 

8 Auto-Encoder 


PRINCIPAL COMPONHN'TL 
ANALYSIS (PCA) 


e Principal Component Analysis is a statistical process that 
converts the observations of correlated features into a set of 
linearly uncorrelated features with the help of orthogonal 
transformation. These new transformed features are called 
the Principal Components. It is one of the popular tools that 
is used for exploratory data analysis and predictive modeling. 
PCA works by considering the variance of each attribute 
because the high attribute shows the good split between the 
classes, and hence it reduces the dimensionality. Some real- 
world applications of PCA are image processing, movie 
recommendation system, optimizing the power allocation in 
various communication channels. 


BACKWARD FPRATURE: 
oS re A Ne ed 

e The backward feature elimination technique is mainly used 
while developing Linear Regression or Logistic Regression 
model. Below steps are performed in this technique to reduce 
the dimensionality or in feature selection: 

1.In this technique, firstly, all the n variables of the given 
dataset are taken to train the model. 

2.The performance of the model is checked. 

3.Now we will remove one feature each time and train the 
model on n-1 features for n times, and will compute the 
performance of the model. 

4.We will check the variable that has made the smallest or no 
change in the performance of the model, and then we will 
drop that variable or features; after that, we will be left with 
n-1 features. 

o. Repeat the complete process until no feature can be dropped. 

e In this technique, by selecting the optimum performance of 
the model and maximum tolerable error rate, we can define 
the optimal number of features require for the machine 
learning algorithms. 


FORWARD FPRATT'U RE: 
SHhULEC TION 


Forward feature selection follows the inverse process of the 
backward elimination process. It means, in this technique, we 
don't eliminate the feature; instead, we will find the best features 
that can produce the highest increase in the performance of the 
model. Below steps are performed in this technique: 
e We start with a single feature only, and progressively we will 
add each feature at a time. 
e Here we will train the model on each feature separately. 
e The feature with the best performance is selected. 
e The process will be repeated until we get a significant 
increase in the performance of the model. 


MISSING VALUE RATIO 


If a dataset has too many missing values, then we drop those 
variables as they do not carry much useful information. To 
perform this, we can set a threshold level, and if a variable has 
missing values more than that threshold, we will drop that 
variable. The higher the threshold value, the more efficient the 
reduction. 

LOW VARIANCE FIL‘TRR 
As same as missing value ratio technique, data columns with 
some changes in the data have less information. Therefore, we 
need to calculate the variance of each variable, and all data 
columns with variance lower than a given threshold are dropped 
because low variance features will not affect the target variable. 


FIiIGH CORRELATION 
Pil‘ DER 

High Correlation refers to the case when two variables carry 
approximately similar information. Due to this factor, the 
performance of the model can be degraded. This correlation 
between the independent numerical variable gives the calculated 
value of the correlation coefficient. If this value is higher than the 
threshold value, we can remove one of the variables from the 
dataset. We can consider those variables or features that show a 
high correlation with the target variable. 


RAINDOM FOREST 


Random Forest is a popular and very useful feature selection 
algorithm in machine learning. This algorithm contains an in- 
built feature importance package, so we do not need to program 
it separately. In this technique, we need to generate a large set of 
trees against the target variable, and with the help of usage 
Statistics of each attribute, we need to find the subset of 
features. 
Random forest algorithm takes only numerical variables, so we 
need to convert the input data into numeric data using hot 
encoding. 


rPACTOR ANALYSIS 


Factor analysis is a technique in which each variable is kept 
within a group according to the correlation with other variables, 
it means variables within a group can have a high correlation 
between themselves, but they have a low correlation with 
£21 F210) ors 0) O10 OL od aed KO LUN OSE 
We can understand it by an example, such as if we have two 
variables Income and spend. These two variables have a high 
correlation, which means people with high income spends more, 
and vice versa. So, such variables are put into a group, and that 
sroup is known as the factor. The number of these factors will be 
reduced as compared to the original dimension of the dataset. 
AUTO-ENCODERS 
One of the popular methods of dimensionality reduction is auto- 
encoder, which is a type of ANN or artificial neural network, and 
its main aim is to copy the inputs to their outputs. In this, the 
input is compressed into latent-space representation, and output 
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It has mainly two parts: 
e Encoder: The function of the encoder is to compress the 
input to form the latent-space representation. 
e Decoder: The function of the decoder is to recreate the 
output from the latent-space representation. 


Vimensionality Reduction 

















PRINCIPAL COMPONENT 
ANALYSIS 


Principal Component Analysis is an unsupervised learning algorithm 
that is used for the dimensionality reduction in machine learning. It 
is a Statistical process that converts the observations of correlated 
features into a set of linearly uncorrelated features with the help of 
orthogonal transformation. These new transformed features are 
called the Principal Components. It is one of the popular tools that 
is used for exploratory data analysis and predictive modeling. It is a 
technique to draw strong patterns from the given dataset by 
reducing the variances. PCA generally tries to find the lower- 
dimensional surface to project the high-dimensional data. PCA 
works by considering the variance of each attribute because the 
high attribute shows the good split between the classes, and hence 
it reduces the dimensionality. Some real-world applications of PCA 
are image processing, movie recommendation system, optimizing 
the power allocation in various communication channels. It is a 
feature extraction technique, so it contains the important variables 
and drops the least important variable. 
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Principal Components Analysis 
(PCA) 





PGR ANIAGIONRUNAEAIVA 
A BIEINTAMHICIAN 
CONGEaS 


Variance and Covariance. 
Figenvalues and Eigen factors. 


MON 


Dimensionality: It is the number of features or variables 
present in the given dataset. More easily, it is the number of 
columns present in the dataset. 

Correlation: It signifies that how strongly two variables are 
related to each other. Such as if one changes, the other 
variable also gets changed. The correlation value ranges from 
-1 to +1. Here, -1 occurs if variables are inversely proportional 
to each other, and +1 indicates that variables are directly 
proportional to each other. 

Orthogonal: It defines that variables are not correlated to each 
other, and hence the correlation between the pair of variables 
iS Zero. 

Figenvectors: If there is a square matrix M, and a non-zero 
vector v is given. Then v will be eigenvector if Av is the scalar 
multiple of v. 

Covariance Matrix: A matrix containing the covariance 
between the pair of variables is called the Covariance Matrix. 


SUPPORT VECTOR MACHINE 
ALGORITHM 
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Supervised Learning algorithms, which is used for Classification 
as well as Regression problems. However, primarily, it is used for 
Classification problems in Machine Learning. The goal of the 
SVM algorithm is to create the best line or decision boundary 
that can segregate n-dimensional space into classes so that we 
can easily put the new data point in the correct category in the 


future. This best decision boundary is called a hyperplane. 
SVM chooses the extreme points/vectors that help in creating 
the hyperplane. These extreme cases are called as support 
vectors, and hence algorithm is termed as Support Vector 
Machine. Consider the below diagram in which there are two 
different categories that are classified using a decision boundar 
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SVM can be understood with the example that we have used in 
the KNN classifier. Suppose we see a strange cat that also has 
some features of dogs, so if we want a model that can accurately 
identify whether it is a cat or dog, so such a model can be 
created by using the SVM algorithm. We will first train our model 
with lots of images of cats and dogs so that it can learn about 
different features of cats and dogs, and then we test it with this 
Strange creature. SO as support vector creates a decision 
boundary between these two data (cat and dog) and choose 
extreme cases (Support vectors), it will see the extreme case of 
cat and dog. On the basis of the support vectors, it will classify it 
as a cat. Consider the below diagram: 











TYPES OF SVM 


e Linear SVM: Linear SVM is used for linearly separable data, 
which means if a dataset can be classified into two classes by 
using a single straight line, then such data is termed as 
linearly separable data, and classifier is used called as Linear 
SVM classifier. 

e Non-linear SVM: Non-Linear SVM is used for non-linearly 
separated data, which means if a dataset cannot be classified 
by using a straight line, then such data is termed as non- 
linear data and classifier used is called as Non-linear SVM 
classifier. 
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THE SVM ALGORITHM 


Hyperplane: There can be multiple lines/decision boundaries to 
Segregate the classes in n-dimensional space, but we need to 
find out the best decision boundary that helps to classify the 

data points. This best boundary is known as the hyperplane of 
SVM. The dimensions of the hyperplane depend on the 

features present in the dataset, which means if there are 2 
features (as shown in image), then hyperplane will be a 

Straight line. And if there are 3 features, then hyperplane 

will be a 2-dimension plane. We always create a hyperplane 

that has a maximum margin, which means the maximum 

distance between the data points. 


Support Vectors: The data points or vectors that are the closest 
to the hyperplane and which affect the position of the 
hyperplane are termed as Support Vector. Since these vectors 
support the hyperplane, hence called a Support vector. 
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Suppose we have two support vectors where 
one is a positive example (e.g., x*) and the 
other is a negative example (e.g., x7 ) 


We also have the weight vector w which is 
orthogonal to the decision boundary 


Since the support vectors define the margin, 
to calculate the margin, we can first subtract 
xtby x~. and perform dot product on the 
subtracted vector and the unit vector of w 
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Maximum Margin (xt = x”) 
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LINEAR SVM WORK 


e The working of the SVM 
algorithm can be 
understood by using an 
example. Suppose we 
have a dataset that has 
two tags (green and 
blue), and the dataset 
has two features 
x1 and x2. We want a 
classifier that can 
GETS IAM ale 
pair(x1, x2) of 
coordinates in either 

ereen or blue. 
Consider the below 
image: 








SO as it is 2-d space 
so by just using a 
Straight line, we can 
easily separate these 
two classes. But there 
can be 
multiple lines that 
can separate these 
classes. 
Consider the below 
image: 


e Hence, the SVM algorithm 
helps to find the best line 
or decision boundary; this 
best boundary or region is . 

called as a hyperplane. Support vector“ Optimal Hyperplane 
SVM algorithm finds the 
closest point of the 
lines from both the 
classes. These points are 
called support vectors. 
The distance between the 
vectors and the 
hyperplane 
is called as margin. And 
the goal of SVM is to 
maximize this margin. The 
hyperplane with maximum 
margin is called the 
optimal hyperplane. 


INOWN-LIN BAR SVM 





e If data is linearly 
arranged, then we 
can separate it by 

using a straight line, 
but for non-linear 
data, we cannot draw 
a single straight line. 
Consider the below 
image: 





e By adding the third 
abheatesalcy(eyemmnets 
sample space will 
become as below 
image: 


e So now, SVM will 
divide the datasets 
into classes in the 

following way. 
(@royatsjre ls) ano elon olsen 


Best Hyperplane image: 


e Since we are in 3-d 
Space, hence it is 
looking like a plane 
parallel to the x-axis. 
If we convert it in 2d 
Space with z=1, then 
it will become as: 


/Best Hyperplane 
A A 


Hence we get a 

A circumference of radius 1 

in case of non-linear 
data. 








RANDOM FOREST 
ALGORITHM 


Random Forest is a popular machine learning algorithm that 
belongs to the supervised learning technique. It can be used for 
both Classification and Regression problems in ML. It is based on 
the concept of ensemble learning, which is a process of 
combining multiple classifiers to solve a complex problem and to 
improve the performance of the model. As the name suggests, 
"Random Forest is a classifier that contains a number of decision 
trees on various subsets of the given dataset and takes the 
average to improve the predictive accuracy of that dataset." 
Instead of relying on one decision tree, the random forest takes 
the prediction from each tree and based on the majority votes of 
predictions, and it predicts the final output. The greater number 
of trees in the forest leads to higher accuracy and prevents the 
problem of overfitting. 
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Random Forest works in two-phase first is to create the random 
forest by combining N decision tree, and second is to make 
predictions for each tree created in the first phase. 
The Working process can be explained 
in the below steps and diagram: 
e Step-1: Select random K data points from the training set. 
e Step-2: Build the decision trees associated with the 
Sc) (ox 61 ox 6 OF 16-1 O10) 1 PLES (SLU10)<(o1 BS) F 
e Step-3: Choose the number N for decision trees that you 
want to build. 
e Step-4: Repeat Step 1 & 2. 
e Step-5: For new data points, find the predictions of each 
decision tree, and assign the new data points to the category 
that wins the majority votes. 
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Suppose there is a 
dataset that contains 
multiple fruit images. 

So, this dataset is 
siven to the Random 
forest classifier. The 
dataset is divided into Haj 
subsets and given to Pease 
each decision tree. 

During the training 
phase, each decision 

tree produces a 
prediction result, and 
when a new data point 
occurs, then 
based on the majority 
of results, the Random 
Forest 
classifier predicts the 
final decision. 


Final-Class 


The more often a gene is 
Forest chosen as a splitter 
(n=10,001 trees) variable, the higher its 
“Variable Importance” — 
This can be used to 
prioritize which genes to 
select for an assay with 
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DECISION TREE ALGORITHM 


Decision tree learning is one of the predictive modelling 
approaches used in statistics, data mining and machine learning. 
It uses a decision tree (as a predictive model) to go from 
observations about an item (represented in the branches) to 
conclusions about the item's target value (represented in the 
leaves). Tree models where the target variable can take a discrete 
set of values are called classification trees; in these tree 
structures, leaves represent class labels and branches represent 
conjunctions of features that lead to those class labels. Decision 
trees where the target variable can take continuous values 
(typically real numbers) are called regression trees. Decision 
trees are among the most popular machine learning algorithms 
siven their intelligibility and simplicity. 





DECISION TREE 


awe root node 
decision nodes 


: Does the 
recommended product | decline 
y 3 suit my outfit? offer 
isi : leaf nodes 
Decision Tree: accept decline 
offer Miicts 





Decision Tree is a Supervised learning technique that can be 
used for both classification and Regression problems, but 
mostly it is preferred for solving Classification problems. It is 
a tree-structured classifier, where internal nodes represent 
the features of a dataset, branches represent the decision 
rules and each leaf node represents the outcome. 

In a Decision tree, there are two nodes, which are the 
Decision Node and Leaf Node. 

Decision nodes are used to make any decision and have 
multiple branches, whereas Leaf nodes are the output of 
those decisions and do not contain any further branches. 

The decisions or the test are performed on the basis of 
features of the given dataset. 

It is a graphical representation for getting all the possible 
solutions to a problem /decision based on given conditions. 

It is called a decision tree because, similar to a tree, it starts 
with the root node, which expands on further branches and 
constructs a tree-like structure. 

In order to build a tree, we use the CART algorithm, which 
stands for Classification and Regression Tree algorithm. 


A decision tree simply asks a question, and based on the 
answer (Yes/No), it further split the tree into subtrees. 
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Root Node: Root node is from where the decision tree starts. 
It represents the entire dataset, which further gets divided 
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Leaf Node: Leaf nodes are the final output node, and the tree 
cannot be segregated further after getting a leaf node. 
Splitting: Splitting is the process of dividing the decision 
node/root node into sub-nodes according to the given 
conditions. 

Branch /Sub Tree: A tree formed by splitting the tree. 
Pruning: Pruning is the process of removing the unwanted 
branches from the tree. 

Parent/Child node: The root node of the tree is called the 
parent node, and other nodes are called the child nodes. 


Leaf Node Leaf Node 


| 
| 
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While implementing a Decision tree, the main issue arises that 
how to select the best attribute for the root node and for sub- 
nodes. So, to solve such problems there is a technique which is 
called as Attribute selection measure or ASM. By this 
measurement, we can easily select the best attribute for the 
nodes of the tree. There are two popular techniques for ASM, 
which are: 

e Information Gain 

e Gini Index 
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Information gain is the measurement of changes in entropy 
after the segmentation of a dataset based on an attribute. 

It calculates how much information a feature provides us 
about a class. 

According to the value of information gain, we split the node 
and build the decision tree. 

A decision tree algorithm always tries to maximize the value 
of information gain, and a node /attribute having the highest 
information gain is split first. It can be calculated using the 
below formula: 

Information Gain= Entropy(S)- [(Weighted Avg) 
*Entropy(each feature) 


Bi INST ROPY 
e Entropy is a metric to measure the impurity in a given 
attribute. It specifies randomness in data. Entropy can be 
calculated as: 
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) 


S= Total number of samples, P(yes)= probability of yes, 
P(no)= probability of no 


GIINI LINDE ZX 
Gini index is a measure of impurity or purity used while 
creating a decision tree in the CART(Classification and 
Regression Tree) algorithm. 
An attribute with the low Gini index should be preferred as 
compared to the high Gini index. 
It only creates binary splits, and the CART algorithm uses the 


Gini index to create binary splits. 
Gini index can be calculated using the below formula: 


Gini Index= 1- yjPj2 
Bix AMS LE 


Suppose there is a candidate who has a job offer and wants to 
decide whether he should accept the offer or Not. So, to solve 
this problem, the decision tree starts with the root node 
(Salary attribute by ASM). The root node splits further into 
the next decision node (distance from the office) and one leaf 
node based on the corresponding labels. The next decision 
node further gets split into one decision node (Cab facility) 
and one leaf node. Finally, the decision node splits into two 
leaf nodes (Accepted offers and Declined offer). Consider the 
below diagram: 


Salary is between 
550000-$80000 


Office near to | By -Yaliatcts 
home offer 


Provides Cab ~ Declined 
facility | offer 


Accepted Y Declined 
offer . offer 








MAPREDUCE 


MapReduce is a processing technique and a program model for 
distributed computing based on java. The MapReduce algorithm 
contains two important tasks, namely Map and Reduce. Map 
takes a set of data and converts it into another set of data, where 
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pairs). Secondly, reduce task, which takes the output from a map 
aS an input and combines those data tuples into a smaller set of 
tuples. As the sequence of the name MapReduce implies, the 
reduce task is always performed after the map job. The major 
advantage of MapReduce is that it is easy to scale data processing 
over multiple computing nodes. Under the MapReduce model, 
the data processing primitives are called mappers and reducers. 
Decomposing a data processing application into mappers and 
reducers iS sometimes nontrivial. But, once we write an 
application in the MapReduce form, scaling the application to run 
over hundreds, thousands, or even tens of thousands of machines 
in a cluster is merely a configuration change. This simple 
scalability is what has attracted many programmers to use the 
MapReduce model. 





The overall MapReduce word count process 


Splitting Mapping Shuitling Reducing Final result 





Generally MapReduce paradigm is based on sending the 
computer to where the data resides! 

MapReduce program executes in three stages, namely map 
Stage, shuffle stage, and reduce stage. 

Map stage —- The map or mapper’s job is to process the input 
data. Generally the input data is in the form of file or 
directory and is stored in the Hadoop file system (HDFS). 

The input file is passed to the mapper function line by line. 
The mapper processes the data and creates several small 
chunks of data. 

Reduce stage - This stage is the combination of the Shuffle 
stage and the Reduce stage. The Reducer’s job is to process 
the data that comes from the mapper. After processing, it 
produces a new Set of output, which will be stored in the 
HDES. 

During a MapReduce job, Hadoop sends the Map and Reduce 
tasks to the appropriate servers in the cluster. 

The framework manages all the details of data-passing such 
as issuing tasks, verifying task completion, and copying data 
around the cluster between the nodes. 

Most of the computing takes place on nodes with data on 
local disks that reduces the network traffic. 

After completion of the given tasks, the cluster collects and 
reduces the data to form an appropriate result, and sends it 
| O}-16) al KOH OL oe w F-16010) Do) Mo) 





BAYES: THEOREM 


Bayes’ theorem is also known as Bayes’ rule, Bayes’ law, or 
Bayesian reasoning, which determines the probability of an event 
with uncertain knowledge. In probability theory, it relates the 
conditional probability and marginal probabilities of two random 
events. Bayes' theorem was named after the _ British 
mathematician Thomas Bayes. The Bayesian inference is an 
application of Bayes' theorem, which is fundamental to Bayesian 
Statistics. It is a way to calculate the value of P(BJA) with the 
knowledge of P(A|B). Bayes' theorem allows updating the 
probability prediction of an event by observing new information 
of the real world. 
Bix A MF LE 
e If cancer corresponds to one's age then by using Bayes' 
theorem, we can determine the probability of cancer more 
accurately with the help of age. Bayes' theorem can be 
derived using product rule and conditional probability of 
event A with known event B. 
P(A /\ B)= P(A[B) P(B) or P(A (\ B)= P(BJA) P(A) 


Bayes Theorem 


Likelihood 

Probability of collecting 

this data when our 

hypothesis is true : 
| Prior 

The probability of the 


p(T | hypothesis being true 
P(HID) = OPA) before collecting dat 


P(D) 


Posterior 

a 4 Marginal 
‘ne Pay Ot OFF What is the probability of 
collecting this data under 
all possible hypotheses? 


hypothesis being true given 
the data collected 





P(BJA) P(A) ‘ 
(B| , a | P(A,|B) = PCA P(BIA; } 
P(B) YA, P(Ai*P(BIA; ) 


P(A|B) = 





e P(A|B) is known as posterior, which we need to calculate, and 
it will be read as Probability of hypothesis A when we have 
occurred an evidence B. 
P(BJA) is called the likelihood, in which we consider that 
hypothesis is true, then we calculate the probability of 
evidence. 
P(A) is called the prior probability, probability of hypothesis 
before considering the evidence 
P(B) is called marginal probability, pure probability of an 
evidence. 
7 3) ees (a), in general, we can write 

A)*P(BIAi) 
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e Bayesian belief network is key computer technology for 
dealing with probabilistic events and to solve a problem 
which has uncertainty. We can define a Bayesian network 

as " A Bayesian network is a probabilistic graphical model 
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dependencies using a directed acyclic graph." It is also 

called a Bayes network, belief network, decision network, 

or Bayesian model. Bayesian networks are probabilistic, because 
these networks are built from a probability distribution, and also 
use probability theory for prediction and anomaly detection. 
Directed Acyclic Graph 
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NAIVE BAYES CLASSIFIER 


Naive Bayes classifiers are highly scalable, requiring a number of 
parameters linear in the number’ of _ variables 
(features /predictors) in a learning problem. Maximum-likelihood 
training can be done by evaluating a closed-form expression, 718 
which takes linear time, rather than by expensive iterative 
approximation as used for many other types of classifiers. 


NAIVE: BAY EIS 
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e Naive Bayes algorithm is a supervised learning algorithm, 
which is based on Bayes theorem and used for solving 
classification problems. 

e It is mainly used in text classification that includes a high- 
dimensional training dataset. 

e Naive Bayes Classifier is one of the simple and most effective 
Classification algorithms which helps in building the fast 
machine learning models that can make quick predictions. 

e It is a probabilistic classifier, which means it predicts on the 
basis of the probability of an object. 

e Some popular examples of Naive Bayes Algorithm are spam 
filtration, Sentimental analysis, and classifying articles. 

e Naive: It is called Naive because it assumes that the 
occurrence of a certain feature is independent of the 
occurrence of other features. Such as if the fruit is identified 
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and sweet fruit is recognized as an apple. Hence each feature 
individually contributes to identify that it is an apple without 
depending on each other. 

e Bayes: It is called Bayes because it depends on the principle 
of Bayes' Theorem. 





CLUSTER ANALYSIS 


Cluster analysis or clustering is the task of grouping a set of 
objects in such a way that objects in the same group (called a 
cluster) are more similar (in some sense) to each other than to 
those in other groups (clusters). It is a main task of exploratory 
data mining, and a common technique for statistical data 
analysis, used in many fields, including pattern recognition, 
image analysis, information retrieval, bioinformatics, data 
compression, computer graphics and machine learning. Cluster 
analysis was originated in anthropology by Driver and Kroeber in 
1932 and introduced to psychology by Joseph Zubin in 1938 and 

Robert Tryon in 1939 and famously used by Cattell beginning in 

1943 for trait theory classification in personality psychology. 

e Clustering or cluster analysis is a machine learning technique, 
which groups the unlabelled dataset. It can be defined as "A 
way of grouping the data points into different clusters, 
consisting of similar data points. The objects with the possible 
similarities remain in a group that has less or no similarities 
with another group." 

e It does it by finding some similar patterns in the unlabelled 
GFE] = Tso] Gans] U6) 8 = KSEE) BF) OL oom) VAcHm 10) (0) OPIN OL ©) OT hYA (0) Col KORPEET- POLO MMOD MYA (OKoxs 
them as per the presence and absence of those similar 
patterns. 

e It is an unsupervised learning method, hence no supervision is 
provided to the algorithm, and it deals with the unlabeled 
dataset. 

e After applying this clustering technique, each cluster or group 
is provided with a cluster-ID. ML system can use this id to 
simplify the processing of large and complex datasets. 

e The clustering technique is commonly used for statistical 
data analysis. 





Unlabelled Data Labelled Clusters 
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Let's understand the clustering technique with the real- 
world example of Mall: When we visit any shopping mall, we 
can observe that the things with similar usage are grouped 
together. Such as the t-shirts are grouped in one section, and 
trousers are at other sections, similarly, at vegetable sections, 
apples, bananas, Mangoes, etc., are grouped in separate 
Sections, so that we can easily find out the things. The 
clustering technique also works in the same way. Other 
examples of clustering are grouping documents according to 
the topic. 
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e The clustering methods are broadly divided into Hard 
clustering (data point belongs to only one group) and Soft 
Clustering (data points can belong to another group also). 

But there are also other various approaches of Clustering 
exist. Below are the main clustering methods used 
in Machine learning. 
Partitioning Clustering 
Density-Based Clustering 
Distribution Model-Based Clustering 
Hierarchical ClusteringFuzzy Clustering 


PAR'TDITIION IIN G 
CLUS 'THRING 

It is a type of clustering 
that divides the data into 
non-hierarchical groups. It 
is also known as_ the 
centroid-based method. 
The most exeyenvenreyn 
example of partitioning 
clustering is the K-Means 
Clustering algorithm. In 
this type, the dataset is B& 
divided into a set of k& 
sroups, where K is used to 
define the number of pre- 
defined groups. ‘The 
cluster center is created in 
Such a way that the 
distance between the data 
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minimum as compared to 
another cluster centroid. 





DEINSITY-BAS ED 
CLUS'THRRING 

The density-based clustering 
method connects the highly- 
dense areas into clusters, 
and the arbitrarily shaped 
distributions are formed asi 
long as the dense region can 
be connected. This algorithm 
does it by _ identifying 
different clusters in thef 
dataset and connects the§ 
areas of high densities into 
clusters. The dense areas in 
data space are divided from 
each other by sparser areas. 





DISTRIBUTION MODE L- 
BASED CLUS THRING 
In the distribution model-based clustering method, the data is 
divided based on the probability of how a dataset belongs to a 
particular distribution. The grouping is done by assuming some 
distributions commonly Gaussian Distribution. 
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Hierarchical clustering 
can be used as an 
alternative for the 
partitioned clustering 
as there is no 
requirement of pre- 
Specifying the number 
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created. In this 
technique, the dataset™ 
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to create a_ tree-like 
structure, which is also 
called a dendrogram. 
The observations or 
any number of clusters 
can be selected Dy fae TS 
cutting the tree at the[iiie>, ame aaa —u 
correct level. The most i"tay"= am 
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BRPUAAY CLUS THRING 


Fuzzy clustering is a type of soft method in which a data object 
may belong to more than one group or cluster. Each dataset has 
a set of membership coefficients, which depend on the degree of 
membership to be in a cluster. Fuzzy C-means algorithm is the 
example of this type of clustering; it is sometimes also known as 
the Fuzzy k-means algorithm. 
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The Clustering algorithms can be divided based on their models 
that are explained above. There are different types of clustering 
algorithms published, but only a few are commonly used. 

The clustering algorithm is based on the kind of data that 

we are uSINg. 


K-Means algorithm: The k-means algorithm is one of the 
most popular clustering algorithms. It classifies the dataset 
by dividing the samples into different clusters of equal 
variances. The number of clusters must be specified in this 
algorithm. It is fast with fewer computations required, with 
the linear complexity of O(n). 

Mean-shift algorithm: Mean-shift algorithm tries to find the 
dense areas in the smooth density of data points. It is an 
example of a centroid-based model, that works on updating 
the candidates for centroid to be the center of the points 
within a given region. 

DBSCAN Algorithm: It stands for Density-Based Spatial 
Clustering of Applications with Noise. In this algorithm, the 
areas of high density are separated by the areas of low 
density. 

Expectation-Maximization Clustering using GMM: This 
algorithm can be used as an alternative for the k-means 
algorithm or for those cases where K-means can be failed. In 
GMM, it is assumed that the data points are Gaussian 
Ghisisuloelicrer 

Agglomerative Hierarchical algorithm: The Agglomerative 
hierarchical algorithm performs the bottom-up hierarchical 
clustering. In this, each data point is treated as a single 
cluster at the outset and then successively merged. 

Affinity Propagation: It is different from other clustering 
algorithms as it does not require to specify the number of 
clusters. 


ARTIFICIAL NEURAL 
NETWORK 


Artificial neural networks (ANNs), usually simply called neural 
networks (NNs), are computing systems vaguely inspired by the 
biological neural networks that constitute animal brains. An ANN 
is based on a collection of connected units or nodes called 
artificial neurons, which loosely model the neurons in a biological 
brain. Each connection, like the synapses in a biological brain, 

can transmit a signal to other neurons. An artificial neuron that 
receives a signal then processes it and can signal neurons 
connected to it. The "signal" at a connection is a real number, 

and the output of each neuron is computed by some non-linear 
function of the sum of its inputs. The connections are called 
edges. Neurons and edges typically have a weight that adjusts as 
learning proceeds. The weight increases or decreases the 
strength of the signal at a connection. Neurons may have a 
threshold such that a signal is sent only if the aggregate signal 
crosses that threshold. Typically, neurons are aggregated into 
layers. Different layers may perform different transformations on 
their inputs. Signals travel from the first layer (the input layer), to 
the last layer (the output layer), possibly after traversing the 
layers multiple times. 

Artificial Neural Network Tutorial provides basic and advanced 
concepts of ANNs. Our Artificial Neural Network tutorial is 
developed for beginners as well as professions. 








~ Dendrite 





e The given figure illustrates the typical diagram of Biological 
Neural Network. 

e The typical Artificial Neural Network looks something like the 
siven figure. 
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Dendrites from Biological Neural Network represent inputs in 
Artificial Neural Networks, cell nucleus represents Nodes, 
synapse represents Weights, and Axon represents Output. 


MCCU LIOC FH-PPINTS 
MODEL OFT NEURON 
The early model of an artificial neuron is introduced by 
Warren McCulloch and Walter Pitts in 1943. The McCulloch- 
Pitts neural model is also known as linear threshold gate. It is 
a neuron of a set of inputs I], I2,..., Im and one output y. The 
linear threshold gate simply classifies the set of inputs into 
two different classes. Thus the ote y is binary. 


Sum = 1;W;. 


fj] 





y = f( Sum) 


W1,W2,W3....Wn are weight values normalized in the range of 
either (0,1)or (-1,1) and associated with each input line, Sum is 
the weighted sum, and is a threshold constant. The function f 
is a linear step function at the threshold 
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Input is multi-dimensional (i.e. input can be a vector): 

input x = (Il, [2, .., In) 

Input nodes (or units) are connected (typically fully) to a node 

(or multiple nodes) in the next layer. A node in the next layer 

takes a weighted sum of all its inputs: 


SummediInput = 





c | a 7 | “a 
Rule: If summed input ? t, J F uy iF Zt 


then it “fires” (output y = 1). then v=1 


Else (Summed input < t) it 
doesn't fire (output y = 0). 


else (if S >, wjlj < t) 
then y=0 





HE Ct tea aL 


The term “Artificial Neural Network" is derived from 
Biological neural networks that develop the structure of a 
human brain. Similar to the human brain that has neurons 
interconnected to one another, artificial neural 

networks also have neurons that are interconnected to 

one another in various layers of the networks. 

These neurons are known as nodes. 

An Artificial Neural Network in the field of Artificial 
intelligence where it attempts to mimic the network of 
neurons makes up a human brain so that computers will have 
an option to understand things and make decisions in a 
human-like manner. The artificial neural network is designed 
by programming computers to behave simply like 
interconnected brain cells. 

There are around 1000 billion neurons in the human brain. 
Each neuron has an association point somewhere in the 
range of 1,000 and 100,000. In the human brain, data is 
stored in such a manner as to be distributed, and we can 
extract more than one piece of this data when necessary 
from our memory parallelly. We can say that the human brain 
is made up of incredibly amazing parallel processors. 

We can understand the artificial neural network with an 
example, consider an example of a digital logic gate that takes 
an input and gives an output. "OR" gate, which takes two 
inputs. If one or both the inputs are "On," then we get "On" in 
output. If both the inputs are "Off," then we get "Off" in 
output. Here the output depends upon input. Our brain does 
not perform the same task. The outputs to inputs relationship 
keep changing because of the neurons in our brain, which are 
“learning.” 
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To understand the concept of the architecture of an artificial 
neural network, we have to understand what a neural network 
consists of. In order to define a neural network that consists of a 
large number of artificial neurons, which are termed units 
arranged in a sequence of layers. Lets us look at various types of 
layers available in an artificial neural network. 
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Input Layer: As the name suggests, it accepts inputs in several 
different formats provided by the programmer. 

Hidden Layer: The hidden layer presents in-between input and 
output layers. It performs all the calculations to find hidden 
features and patterns. 

Output Layer: The input goes through a Series of transformations 
using the hidden layer, which finally results in output that is 
conveyed using this layer. The artificial neural network takes 
input and computes the weighted sum of the inputs and includes 
a bias. This computation is represented in the form of a transfer 
function. 
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Artificial Neural Network can be best represented as a weighted 
directed graph, where the artificial neurons form the nodes. The 
association between the neurons outputs and neuron inputs can 
be viewed as the directed edges with weights. The Artificial 
Neural Network receives the input signal from the external 
source in the form of a pattern and image in the form of a vector. 
These inputs are then mathematically assigned by the notations 
x(n) for every n number of inputs. 
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Error signals 


Afterward, each of the input is multiplied by its corresponding 
weights ( these weights are the details utilized by the artificial 
neural networks to solve a specific problem ). In general terms, 
these weights normally represent the strength of the 
interconnection between neurons inside the artificial neural 
network. All the weighted inputs are summarized inside the 
computing unit. 


BINARY (ANN) 


In binary activation function, the output is either a one or a 0. 
Here, to accomplish this, there is a threshold value set up. If the 
net weighted input of neurons is more than 1, then the final 
output of the activation function is returned as one or else the 
output is returned as 0. 


SIGMOTDAL, 
HYPEHRBOLIC(ANN) 


The Sigmoidal Hyperbola function is generally seen as an "S'" 
shaped curve. Here the tan hyperbolic function is used to 
approximate output from the actual net input. The function is 
defined as: F(x) = (1/1 + exp(-????x)) 

Where ???? is considered the Steepness parameter. 
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In this type of ANN, the output returns into the network to 
accomplish the best-evolved results internally. As per the 
University of Massachusetts, Lowell Centre for Atmospheric 
Research. The feedback networks feed information back into 
itself and are well suited to solve optimization issues. The 
Internal system error corrections utilize feedback ANNS. 


FRE D-FORWARD (ANN) 


A feed-forward network is a basic neural network comprising of 
an input layer, an output layer, and at least one layer of a neuron. 
Through assessment of its output by reviewing its input, the 
intensity of the network can be noticed based on group behavior 
of the associated neurons, and the output is decided. The 
primary advantage of this network is that it figures out how to 
evaluate and recognize input patterns. 


CONVOLUTIONAL NEURAL 
NETWORK 


In deep learning, a convolutional neural network 

(CNN, or ConvNet) is a class of deep neural networks, most 
commonly applied to analyzing visual imagery. They are also 
known as shift invariant or space invariant artificial neural 
networks (SIANN), based on their shared-weights architecture 
and translation invariance characteristics. They have applications 
in image and video recognition, recommender systems, image 
classification, medical image analysis, natural language 
processing, brain-computer interfaces, and financial time series. 
CNNs are regularized versions of multilayer perceptron's. 
Multilayer perceptron's usually mean fully connected networks, 
that is, each neuron in one layer is connected to all neurons in 
the next layer. The "“fully-connectedness" of these networks 
makes them prone to _ overfitting data. Typical ways of 
regularization include adding some form of magnitude 
measurement of weights to the loss function. CNNs take a 
different approach towards regularization: they take advantage of 
the hierarchical pattern in data and assemble more complex 
patterns using smaller and simpler patterns. CNNs use relatively 
little pre-processing compared to other image classification 
algorithms. This means that the network learns the filters that in 
traditional algorithms were hand-engineered. This independence 
from prior knowledge and human effort in feature design is a 
major advantage. 








eSRETHTORIS LAYERS 


Ne Oe ee es 2 
It’s the layer in which we give input to our model. The number of 
neurons in this layer is equal to total number of features in our 
data (number of pixels incase of an image). 


FilDDEN LAY Ek 

The input from Input layer is then feed into the hidden layer. 
There can be many hidden layers depending upon our model and 
data size. Each hidden layers can have different numbers of 
neurons which are generally greater than the number of features. 
The output from each layer is computed by matrix multiplication 
of output of the previous layer with learnable weights of that 
layer and then by addition of learnable biases followed by 
activation function which makes the network nonlinear. 


OUTLPRPUT LAY RR 
The output from the hidden layer is then fed into a logistic 
function like sigmoid or softmax which converts the output of 
each class into probability score of each class. 


Input layer 





hidden layer 1 hidden layer 2 
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Convolution Neural Networks or covnets are neural networks 
that share their parameters. Imagine you have an image. It can be 
represented as a cuboid having its length, width (dimension of 
the image) and height (as image generally have red, green, and 
blue channels). 





Now imagine taking a small patch of this image and running a 
small neural network on it, with say, k outputs and represent 
them vertically. Now slide that neural network across the whole 
image, as a result, we will get another image with different width, 
height, and depth. Instead of just R, G and B channels now we 
have more channels but lesser width and height. his operation is 
called Convolution. If patch size is same as that of the image it 
will be a regular neural network. Because of this small patch, we 
have fewer weignhts. 
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A covnets is a Sequence of layers, and every layer transforms one 
volume to another through differentiable function. 


TYPHS OF LAYERS 


Let’s take an example by running a covnets on of image of 
dimension 32 x 32 x 3. 


Input Layer: This layer holds the raw input of image with 
width 32, height 32 and depth 3. 

Convolution Layer: This layer computes the output volume by 
computing dot product between all filters and image patch. 
Suppose we use total 12 filters for this layer we'll get output 
volume of dimension 32 x 32 x 12. 

Activation Function Layer: This layer will apply element wise 
activation function to the output of convolution layer. Some 
common activation functions are RELU: max(0, x), Sigmoid: 
1/(ite*-x), Tanh, Leaky RELU, etc. The volume remains 
unchanged hence output volume will have dimension 

BY V4 OK 1 

Pool Layer: This layer is periodically inserted in the covnets 
and its main function is to reduce the size of volume which 
makes the computation fast reduces memory and also 
prevents from overfitting. Two common types of pooling 
layers are max pooling and average pooling. If we use a max 
pool with 2 x 2 filters and stride 2, the resultant volume will 
be of dimension 16x16x12. 

Fully-Connected Layer: This layer is regular neural network 
layer which takes input from the previous layer and 
computes the class scores and outputs the 1-D array of size 
equal to the number of classes. 


max pool with 2x2 filters 
and stride 2 
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RECURRENT NEURAL 
NETWORK 


A recurrent neural network (RNN) is a class of artificial neural 
networks where connections between nodes form a directed graph 
along a temporal sequence. This allows it to exhibit temporal 
dynamic behavior. Derived from feedforward neural networks, 
RNNs can use their internal state (memory) to process variable 
length sequences of inputs. This makes them applicable to tasks 
such as unsegmented, connected handwriting recognition 

or speech recognition. The term “recurrent neural network’ is 
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with a similar general structure, where one is finite impulse and 

the other is infinite impulse. Both classes of networks exhibit 
temporal dynamic behavior. A finite impulse recurrent network 

is a directed acyclic graph that can be unrolled and replaced with 

a strictly feedforward neural network, while an infinite impulse 
recurrent network is a directed cyclic graph that can not be 
unrolled. Both finite impulse and infinite impulse recurrent 
networks can have additional stored states, and the storage can be 
under direct control by the neural network. The storage can also 
be replaced by another network or graph, if that incorporates time 
delays or has feedback loops. Such controlled states are referred 

to as gated state or gated memory, and are part of long short-term 
memory networks (LSTMs) and gated recurrent units. This is also 
called Feedback Neural Network (FNN). 
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The Recurrent Neural Network consists of multiple fixed 
activation function units, one for each time step. Each unit has 
an internal state which is called the hidden state of the unit. This 
hidden state signifies the past knowledge that that the network 
currently holds at a given time step. This hidden state is updated 
at every time step to signify the change in the knowledge of the 
network about the past. 


hy = fw(@, he-1) 


hy - The new hidden state 
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Lt - The current input 


tw - The fixed function with trainable weights 


Note that is the initial hidden state of the network. Typically, it is 
a vector of zeros, but it can have other values also. One method 
is to encode the presumptions about the data into the initial 
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determine to the tone of a speech given by a renowned person, 
the person's past speeches’ tones may be encoded into the initial 
hidden state. Another technique is to make the initial hidden 
State as a trainable parameter. Although these techniques add 
little nuances to the network, initializing the hidden state vector 
to zeros is typically an effective choice. 








NETUORKSMNIORMUNT 


e Take input the previous hidden state vector and the current 
input vector. Note that since the hidden state and current 
input are treated as vectors, each element in the vector is 
placed in a different dimension which is orthogonal to the 
other dimensions. Thus each element when multiplied by 
another element only gives a non-zero value when the 
elements involved are non-zero and the elements are in the 
Same dimension. 

e Element-wise multiply the hidden state vector by the hidden 
State weights and similarly perform the element wise 
multiplication of the current input vector and the current 
input weights. This generates the parameterized hidden state 
vector and current input vector. 

e Perform the vector addition of the two parameterized 
vectors and then calculate the element-wise hyperbolic 
tangent to generate the new hidden state vector. 


Calculate Calculate 
Parameterized Element-wise 
Vectors hyperbolic tangent 





During the training of the recurrent network, the network also 
generates an output at each time step. This output is used to 
train the network using gradient descent. 





Let the predicted output of the network at any time step be 
Yt(complex) and the actual output be Yt. Then the error at each 
time step is given by E; =-y il og(1 a, 
The total error is given by the summation of the errors at all the 
time steps. E= > io _ 
=> E=)),-—mlog(y) 
Similarly, the value can be calculated as the summation of 
sradients at each time step. 0 E _ > 0 Ey 
OW fat OW 
Using the chain rule of calculus and using the fact that the 
output at a time Step t is a function of the current hidden state of 
the recurrent unit, the following expression arises:- 
Ww OY, On, t a, hu = | Oh i— 
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Thus, Back-Propagation Through Time only differs from a typical 
Back-Propagation in the fact the errors at each time step are 
Summed up to calculate the total error. 





GENERATIVE ADVERSARIAL 
NETWORK 


A generative adversarial network (GAN) is a class of machine 
learning frameworks designed by Ian Good fellow and _ his 
colleagues in 2014. Two neural networks contest with each other 
in a game (in the form of a zero-sum game, where one agent's 

Sain is another agent's loss. Given a training set, this technique 
learns to generate new data with the same Statistics as the training 
set. For example, a GAN trained on photographs can generate new 
photographs that look at least superficially authentic to human 
observers, having many realistic characteristics. Though originally 
proposed as a form of generative model for unsupervised learning, 
GANs have also proven useful for semi-supervised learning, fully 
supervised learning, and reinforcement learning. 


Generative Adversarial 
Network 
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e Generative: To learn a generative model, which describes 
how data is generated in terms of a probabilistic model. 

e Adversarial: The training of a model is done in an adversarial 
Setting. 

e Networks: Use deep neural networks as the artificial 
intelligence (AI) algorithms for training purpose. 


In GANSs, there is a generator and a discriminator. The Generator 
generates fake samples of data(be it an image, audio, etc.) and 
tries to fool the Discriminator. The Discriminator, on the other 
hand, tries to distinguish between the real and fake samples. The 
Generator and the Discriminator are both Neural Networks and 
they both run in competition with each other in the training 
phase. The steps are repeated several times and in this, the 
Generator and Discriminator get better and better in their 
respective jobs after each repetition. The working can be 
visualized by the diagram given below: 


| Real Data Samples 
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Here, the generative model captures the distribution of data and 
is trained in such a manner that it tries to maximize the 
probability of the Discriminator in making a mistake. The 
Discriminator, on the other hand, is based on a model that 
estimates the probability that the sample that it got is received 
from the training data and not from the Generator. The GANs are 
formulated as a minimax game, where the Discriminator is trying 
to minimize its reward V(D, G) and the Generator is trying to 
minimize the Discriminator’s reward or in other words, maximize 
its loss. It can be mathematically described by the formula below: 


“ 


min max V(D,G) 
ee Dp : 


V(D,G)=E log D(x)| + E, Wp, (2) \log(1 — D(G(z))| 
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G = Generator 
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Pdata(x) = distribution of real data 
P(z) = distribution of generator 

x = sample from Pdata(x) 

Z = sample from P(z) 

D(x) = Discriminator network 

G(z) = Generator network 


So, basically, training a GAN has two parts: 
Part 1: The Discriminator is trained while the Generator is idle. In 


this phase, the network is only forward propagated and no back- 
propagation is done. The Discriminator is trained on real data for 
n epochs, and see if it can correctly predict them as real. Also, in 
this phase, the Discriminator is also trained on the fake 
generated data from the Generator and see if it can correctly 
predict them as fake. 

Part 2: The Generator is trained while the Discriminator is idle. 
After the Discriminator is trained by the generated fake data of 
the Generator, we can get its predictions and use the results for 
training the Generator and get better from the previous state to 
try and fool the Discriminator. 


Vanilla GAN: This is the simplest type GAN. Here, the Generator 
and the Discriminator are simple multi-layer perceptrons. In 
vanilla GAN, the algorithm is really simple, it tries to optimize the 
mathematical equation using stochastic gradient descent. 
Conditional GAN (CGAN): CGAN can be described as a deep 
learning method in which some conditional parameters are put 
into place. In CGAN, an additional parameter ‘y’ is added to the 
Generator for generating the corresponding data. Labels are also 
put into the input to the Discriminator in order for the 
Discriminator to help distinguish the real data from the fake 
generated data. 

Deep Convolutional GAN (DCGAN): DCGAN is one of the most 
popular also the most successful implementation of GAN. It is 
composed of ConvNets in place of multi-layer perceptrons. The 
ConvNets are implemented without max pooling, which is in fact 
replaced by convolutional stride. Also, the layers are not fully 
connected. 

Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a 
linear invertible image representation consisting of a set of 
band-pass images, spaced an octave apart, plus a low-frequency 
residual. This approach uses multiple numbers of Generator and 
Discriminator networks and different levels of the Laplacian 
Pyramid. This approach is mainly used because it produces very 
high-quality images. The image is down-sampled at first at each 
layer of the pyramid and then it is again up-scaled at each layer 
in a backward pass where the image acquires some noise from 
the Conditional GAN at these layers until it reaches its original 
SVAe 

Super Resolution GAN (SRGAN): SRGAN as the name suggests is 
a way of designing a GAN in which a deep neural network is used 
along with an adversarial network in order to produce higher 
resolution images. This type of GAN is particularly useful in 
optimally up-scaling native low-resolution images to enhance its 
details minimizing errors while doing so. 


LINEAR REGRESSION VS 
LOGISTIC REGRESSION 


Linear Regression and Logistic Regression are the two famous 
Machine Learning Algorithms which come under supervised 
learning technique. Since both the algorithms are of supervised in 
nature hence these algorithms use labeled dataset to make the 
predictions. But the main difference between them is how they 
are being used. The Linear Regression is used for solving 
Regression problems whereas Logistic Regression is used for 
solving the Classification problems. 


Linear Regression Logistic Regression 
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In statistics, linear regression is a linear approach to 
modelling the relationship between a scalar response and one 
or more explanatory variables (also known as dependent and 
independent variables). The case of one explanatory variable 
is called simple linear regression; for more than one, the 
process is called multiple linear regression. 

Linear Regression is one of the most simple Machine learning 
algorithm that comes under Supervised Learning technique 
and used for solving regression problems. 

It is used for predicting the continuous dependent variable 
with the help of independent variables. 

The goal of the Linear regression is to find the best fit line 
that can accurately predict the output for the continuous 
dependent variable. 

If single independent variable is used for prediction then it is 
called Simple Linear Regression and if there are more than 
two independent variables then such regression is called as 
Multiple Linear Regression. 

By finding the best fit line, algorithm establish the 
relationship between dependent variable and independent 
variable. And the relationship should be of linear nature. 

The output for Linear regression should only be the 
continuous values such as price, age, salary, etc. 


Y= ACOtaAIs+ E 


Where, a0 and 
al are the 
coefficients and 
€ is the error 
term. 


experience 
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e Logistic regression is a statistical model that in its basic form 
uses a logistic function to model a binary dependent variable, 
although many more complex extensions exist. In regression 
analysis, logistic regression (or logit regression) is 
estimating the parameters of a logistic model 
(a form of binary regression). 

e Logistic regression is one of the most popular Machine 
learning algorithm that comes under Supervised Learning 
techniques. 

e It can be used for Classification as well as for Regression 
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e Logistic regression is used to predict the categorical 
dependent variable with the help of independent variables. 

e The output of Logistic Regression problem can be only 
between the 0 and 1. 

e Logistic regression can be used where the probabilities 
between two classes is required. Such as whether it will rain 
today or not, either O or 1, true or false etc. 

e Logistic regression is based on the concept of Maximum 
Likelihood estimation. According to this estimation, the 
observed data should be most probable. 

e Such activation function is known as sigmoid function and 
the curve obtained is called as sigmoid curve or S-curve. 
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RIDGE REGRESSION 


Ridge regression is a model tuning method that is used to 
analyze any data that suffers from multicollinearity. This method 
performs L2 regularization. When the issue of multicollinearity 
occurs, least-squares are unbiased, and variances are large, this 
results in predicted values to be far away from the actual values. 
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Lambda is the penalty term. A given here is denoted by an alpha 
parameter in the ridge function. So, by changing the values of 
alpha, we are controlling the penalty term. Higher the values of 
alpha, bigger is the penalty and therefore the magnitude of 
coefficients is reduced. 
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e Where Y is the dependent variable, X represents the 
independent variables, B is the regression coefficients to be 
estimated, and e represents the errors are residuals. 

e Once we add the lambda function to this equation, the 
variance that is not evaluated by the general model is 
considered. After the data is ready and identified to be part of 
L2 regularization, there are steps that one can undertake. 


Bias and variance trade-off is generally complicated when it 


comes to building ridge regression models on an actual dataset. 
However, following the general trend which one needs to 
remember is. 

e The bias increases as A increases. 

e The variance decreases as A increases. 

















The main idea behind In other words, by starting with a slightly worse fit, 
isto finda ine that doesn't fit the can provide better long term 
Training Data as well... predictions. 
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Figure 2: The Geometric interpretation of principal components and 
shrinkage by ridge regression. 





BIAS-VARIANCE TRADEOFF 


In statistics and machine learning, the bias-variance tradeoff is 
the property of a model that the variance of the parameter 
estimates across samples can be reduced by increasing the bias 
in the estimated parameters. The bias-variance dilemma or bias- 
variance problem is the conflict in trying to simultaneously 
minimize these two sources of error that prevent supervised 
learning algorithms from generalizing beyond their training set. 


BIAS) 


The bias is known as the difference between the prediction of 
the values by the ML model and the correct value. Being high in 
biasing gives a large error in training as well as testing data. Its 
recommended that an algorithm should always be low biased to 
avoid the problem of underfitting. By high bias, the data 
predicted is in a straight line format, thus not fitting accurately 
in the data in the data set. Such fitting is known as 

Underfitting of Data. This happens when the hypothesis is 

too simple or linear in nature. 


he(x) = g(4 + 0121 + 6222) 
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The variability of model prediction for a given data point which 
tells us spread of our data is called the variance of the model. The 
model with high variance has a very complex fit to the training 
data and thus is not able to fit accurately on the data which it 
hasn't seen before. As a result, such models perform very well on 
training data but has high error rates on test data. When a model 
is high on variance, it is then said to as Overtitting of Data. 
Overfitting is fitting the training set accurately via complex curve 
and high order hypothesis but is not the solution as the error 
with unseen data is high. While training a data model variance 
should be kept low. 


ho(x) = 9 + O12 + ox" 
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If the algorithm is too simple (hypothesis with linear eq.) then it 
may be on high bias and low variance condition and thus is 
error-prone. If algorithms fit too complex ( hypothesis with high 
degree eq.) then it may be on high variance and low bias. In the 
latter condition, the new entries will not perform well. Well, 
there is something between both of these conditions, known as 
Trade-off or Bias Variance Trade-off. 
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Pe: © An algorithm can't be more 


re’ complex and less complex at 


OO ————————— the same time. 
0 9 4 6 8 10 


The best fit will be given by hypothesis on the tradeoff point. 
The error to complexity graph to show trade-off is given as. This 






is referred to as the best point chosen for the training of the 
algorithm which gives low error in training as well as testing 
data. 


KERNEL FUNCTION 


In machine learning, kernel machines are a class of algorithms for 
pattern analysis, whose best known member is the support vector 
machine (SVM). The general task of pattern analysis is to find and 
study general types of relations (for example clusters, rankings, 
principal components, correlations, classifications) in datasets. For 
many algorithms that solve these tasks, the data in raw 
1K} 8) Kors{o) BLE- 1010) MN OY-hY oe KOMEN OLcMEE =>.¢D) N(GULU AVANT BTS)L0)U00L oO MB TOLRO ME Ccr1AUIKe 
vector representations via a user-specified feature map: in 
contrast, kernel methods require only a user-specified kernel, i.e., 
a similarity function over pairs of data points in raw 
representation. Algorithms capable of operating with kernels 
include the kernel perceptron, support vector machines (SVM), 
Gaussian processes, principal components analysis (PCA), 
(oF 00) 0 (or=) Mn 6X0) Mo) f= 10 10) 0 Mr) OT-0A4:) Isp W (6 2X cE A c1°D Mots) (0) 0 RE) OL <1 G10 Fe) 
clustering, linear adaptive filters and many others. Any linear 
model can be turned into a non-linear model by applying the 
kernel trick to the model: replacing its features (predictors) by a 
kernel function. 





Common kernel functions 


* Some commonly used kernel functions & their shape: 

s . Gaussian Kernel Function 
* Polynomial K (a,b) = (14 S— ajb; I 

j 


Radial Basis Functions 


K (a,b) = exp(—(a — b)? /2a7) 
* Saturating, sigmoid-like: 


K (a,b) = tanh(ca’ b + fr) 





STANDARD KERN BL 
FUNC'ITION BOUATION 


K(x) = 1,2f||z}| <= 1 
K(x) = 0, Otherwise 
GAUSSIAN KEI RIN EL 


ae —_as| [2 
K(2,y) = e (leu) 


GAUSSIAN KE RIN BL 
RADIAL BASIS FUNCTION 
€54=3=)) 

K(e,y)=e(yle—ylP) 
K(x2,271)+ K(a2,x%2)( Simplified — Formula) 
K (2,21) + K(x2,22) > 0(Green) 
K(x,21)+ K(a,x22) = 0( Red) 
SIGMO TLD KERN BL 
K (x,y) = tanh(y.2* y +r) 


POLYNOMIAL KERIN BL 


K (x,y) = tanh(ax.y + b) 
K(x, y) — (1 a ee y)* 


K(x,y) =a exp (sh) — 1 


Porro 


K (x,y) = exp(-allx |) 
Exponential RBF K(x, y) = exp(—al||x — y|) 
SOME FOQUATION 








GAUSSIAN MIXTURE MODEL 


A Gaussian mixture model is a probabilistic model that assumes 
all the data points are generated from a mixture of a finite 
number of Gaussian distributions with unknown parameters. One 
can think of mixture models as generalizing k-means clustering to 
incorporate information about the covariance structure of the 
data as well as the centers of the latent Gaussians. 


Introduction to GMM 


¢ Gaussian ¢ Mixture Model 


“Gaussian is a “mixture model is a 
characteristic symmetric probabilistic model which 
"bell curve" shape that assumes the underlying 
quickly falls off towards 0 data to belong to a 
(practically)” mixture distribution” 


Gaussian 
Mixture 
Model 


* Data with D attributes, from Gaussian sources ¢, ... Cc, 


— how typical is x, 


ir _— a a) aia | 

P(x, lc} =4 expy 4%, —-p yd Les — i} 
under sounce c 2 | ! 
, SO ofp -a ie") on -w) 

— how likely that Lon dese ~ Hoa Z | Ota ~ Me) 


x, came from c 


Pp | Py x, ‘a ] Pic) 
tc lXx.j= = Ene SET 
P(E, le)Pic) 
toot =| . 


— how important is x, for source c: ,. = Plc 13,)/(Ple 1¥,)+-.4 Ple 1%,)) 
— mean of attribute ain items assigned toc: uw, =. +..4+74,, 


— covariance of a and Dinitems from c: =,,, = ae wilt — Malte — Heal 





— prior: how many items assigned toc: Pic) =4(Ple 1X.) + ..4 Ple |X.) 





ANOMALY DETECTION 


Anomaly Detection is the technique of identifying rare events or 
observations which can raise suspicions by being statistically 
different from the rest of the observations. Such “anomalous” 
behaviour typically translates to some kind of a problem like a 
credit card fraud, failing machine in a server, a cyber attack, etc. 
e Point Anomaly: A tuple in a dataset is said to be a Point 
Anomaly if it is far off from the rest of the data. 
e Contextual Anomaly: An observation is a Contextual Anomaly 
if it is an anomaly because of the context of the observation. 
e Collective Anomaly: A set of data instances help in finding an 


anomaly. 
Supervised Anomaly Detection: This method requires a labeled 


dataset containing both normal and anomalous samples to 
construct a predictive model to classify future data points. The 
most commonly used algorithms for this purpose are supervised 
Neural Networks, Support Vector Machine learning, K-Nearest 
Neighbors Classifier, etc. 

Unsupervised Anomaly Detection: This method does require any 
training data and instead assumes two things about the data ie 
Only a small percentage of data is anomalous and Any anomaly is 
Statistically different from the normal samples. Based on the 
above assumptions, the data is then clustered using a similarity 
measure and the data points which are far off from the cluster 
are considered to be anomalies. 
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PAYTHON ee] _)= 


import math 

import statistics 

import numpy as np 

eI SCO, Seas 

import pandas as pd 

MS soll ly Zao, 1) Be alll] 

K MLC Mee = PW, Ly Zedly ‘leveliciielii, 8 2s.) 
# Simple print and check the listing 

joweatiaue (0) 

print (x with nan) 


y, y with nan = np.array(x), np.array(x with nan) 
Z, Z with nan = pd.Series(x), pd.Series(x with nan) 
print(y); print(y with nan) 





print(z); print(z with nan) 


import math 

IMPS, SiSsed Seles 

Moos Moy aS lave 

MSO, SC. Sceaice 

import pandas as pd 

ea ho. Uw eo me Or al 

x Weel Mem = te.0, Ly 255, ilelisweim, 4, 26.0) 
# Simple print and check the listing 


Vo SV Waicle, ied = iS Wee EX) » WS .eucielyi>< Wale. imela) 
a, We WALCI MeN = JOC. SSICLSS OK)» JOC. SSICLSS (Xx Walicla iaeua) 


Meea= Sumi) 7 Leas 
jonaminehen Git=roney) 


meanl= statistics.mean (x) 
print (mean1) 


mean2 = np.mean(y) 
print (mean2) 


OUTPUT 


@S) (@8) 189) 
— =| —] 


Mean 


ii atsmsy-1a0) e)(-manlsx-laperclisvemer-tl(ocemcalomct- lan) e)iomevalialantciilemaal=r- lame) mecyian] ©) \amialomoN\.o1e-(e(-mm im ialcmr-laiialpalc)ilem-NVio1e-(6 (me) 
all the items in a dataset. The mean of a dataset x is mathematically expressed as 2ixi/n, where i = 1, 2... n. In 
other words, it’s the sum of all the elements xi divided by the number of items in the dataset x. 


import math 

import statistics 
import numpy as np 
import scipy.stats 
import pandas as pd 


x = (8.0, 1, 2.9, 4, 28.0] 
= Welle Wate Wade WoAd% Walls 


meanl = sum(w[i] * x[i] for i in range(len(x))) / sum(w) 
print (meanl) 


mean2 = sum(x * w for (x, w) in zip(x, w)) / sum(w) 


print (mean2) 


QUTPUT 


0.95 
0.95 
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WM alsm\s'(=)(e]aitcremaalsr-laperclisyomer-li(-lemial-m\(-vlelaltcrem-lalvalaarclilemparct-lame) mice |aicc\e monic) e- (elo Mmlcm-me(-)al-le-l|7a-lile ame) mals 
elaiialaalsvilem parct-laMmuarolme/ar-le)(-ssmel0m (emel-1ilal-Mmial-maclt-lih\iomere)aligle)6) ((e)ame)m=r-\eiamer-it-m ele) almcomial-mactJ6l le 


I WIOCueis, iksnelel 

AMISoOiek. Sie siieabeie Wes 
IjMyeroec. Tebighoy.; els) iyo 
MBahoeret=, (Siew. as eee 
dj@qlexenaiw, jSreliakelers) eis" joerc! 


sour Opa pune = 2), EO, ee es Oa 
ie Ole ee Ol eee sO eC Cha eA oi (Olea, | 
laugQtsvst ghee —- van (><) aes cabiqge((iL ees clio ciqqh amiene ae ei(—1q)le lial >.<) 


Johan enon @eli—r-sep) 


MNS = SiSeSie a Sees -IMeraMOm IS Wiese (Ss) 
[Cnasbighne sangqksysligh) 


Soe Ol pwn 2 mre | 
Ane — vw [mee lire en) en eo ee eae Oe Pies | || 
meeiat, == eibim(Gye ls] “8 Seal) abese SL alin sete (iene) )) 7 Suite) 


print (meanl) 


Wikevelione =) Ficus “s Aho Ge@ie Tox fF Wi) aki Aas WO 7 eBay) 
jopanmenem Girone) 


OUTPUT 


-/61341222837960843 
pee PS Aw A OW OO OAS 
A ekS) 
ee 


ON Ory IS) 


IMO. Wheel 

ALjMmOCNaie: (Sibee ILS cakes 
INMOCRAIE IOushiNe ye Sis Igo 
Ij Slencne ) wsiesl eyes eens 
IU (erehiqveleyse’ es exe! 


gmean = 1 
FOr Leem tm “x: 
gmean *= item 


gmean **= 1 / len(x) 
johanmebem Goni—t-se”) 


miclaanrelalem\irstsla 


Wi atom ars lanete)alcomaatcr-lamtsmialsm aver] ©)ce\er-|meo)mial>manlct-la elm alow a-(el| 6) geler-\(cme)mr-l|MI(>)antsmlamial>metclr-ts\-) hu 7 apa] Glee B 
where i = 1, 2, ..., n and n is the number of items in the dataset x. 


(G7sYovaarsjinrem \Vilstcla 


MM at=WetsYoyaal=ytd (em antsy-la [sm laloMe/malamcole)meymn lal-W e)cele|U(eme)ar-l| 7 M-)(-)aa\- 1a) We amlalee We Fcltcts>)ae omni (ic) R02) 020 aa 
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import math 

IOs Sicees. Sic ses 
IMWoSIaS IMUMons Els) eyo 
INNYOCNSE SSO oSicecs 
IMOOUeIe jereevelels! els) jexol 


n = len (x) 

Wie J we 23 
WNC = SOIC (o<) [eoumch(Wss* (idl) ) | 

else: 
x CRC, MNCex = SOS (><), Toume (0.5 ~~ my) 
ieclueliah = Wo5 * (ox mel | ace —1.] se sx © |l rie Sx ||), 


(OTe TIME Miele 


WieckLei =| Sree Sees MSc aid (x) 


[Oteeinkes (nie Clik ciniaay 


mecha = SCariLSsc1es , mec ein i< el |) 
(IIL iitexCla.etal 


ica Me |S Eere I StS .miselen lhows< |e =, |} ) 


(OIL (SESE S LOS mec mem laacla ox | S—,))) 


OUTPUT 


as 
a, 


AES [NO (ur DIES 


W/KsYolrsla 


ai atswsy-la0) e)(smant=xelt-lam ism laromanliece|(-m>1(>]pel>) aime) mr-msie)al>\emel-lk-ts\-) eum Malomet-lt-ts\-) mer- 1pm ol-mcie) a (cle mlam| alelgct-Fs)iale melt 
decreasing order. If the number of elements n of the dataset is odd, then the median is the value at the middle 
position: 0.5(n + 1). If n is even, then the median is the arithmetic mean of the two values in the middle, that is, 
the items at the positions 0.5n and 0.5n + 1. 


INCE hele @\ 

POC eS Beli oie eS 
ICG Iee iM uilyee els iaye 
Ieee, Ieee snes 
IMMUNO jecwiaicles =i) joc! 


le — ele Sa ee Oe | 
7 ae | le lc) nee |) ee” | 
Mele == ile (bla ieie (abet) y SLES) etore abietSr chigy  fevoue(i)) 4) bal] 


[Qe TE (mack 


nick = Meas (IW sGOUNNE (Wicei) » Mine iii Aes shit mere (yy) )) [il] 


[Ore Ie (merck) 
le f = id jersey (i) 6 jaye). Sue mei (), 


MCS) = STO A SCENES) 4 calorel (1) 
[De LINE (MOCKS) 


WneoC=s = SEoyo Sseics siMocs (v7) 
[Sue aLige (Uilerclo _) 


u, Vv, W = pd.Series(u), pd.Series(v), pd.Series([2, 2, math.nan] ) 
[Sueabigie (blamiereke (1) 
[Sie aligvic. (Cy Aumocle 1) ) 
[Sse higve 1iy 5 qikexcl= 1) | 


OUTPUT 


is) 
ModeResult (mode=array([2]), count=array([2])) 
ModeResult (mode=array([12]), count=array([3])) 


O Zz 
dtype: into4 
O RZ 

1 iS 
dtype: into4 
@ en) 


dtype: floato4 


Wifeyels 


The sample mode is the value in the dataset that occurs most frequently. If there isn’t a single such value, then 
lalcMs\o1@momanlolinianterer= lms) [aler-mimarctsmaaleiit|e)(maslelerolMUccllOlstse 





import math i i oe ee 2 
IOC Ss Senet Se scs [2 3 1] 
INOUE IMB Ele ile [ 7: a | 
INO Geos ee ics [ gf 27 4 | 
IMCs jSciMcas 2S jocl [16 : oo 
_ 

A = ijo.eueweeny (iL. 1, 1) 5 

Pe ely 2 .! 

[4, a, 2], a PeEBeEEE rl 

[8, 27, 4], [6.2 8.2 1.8] 

[Ge al, A) [ee oe 
‘Siealiae (2) i ee eee ees | 

‘i ee ee ee Ee re 

[One Jeane (ae yialerstiq (eh), ) [4 3 1 ] 


print (a.mean() ) 


[ 37.2 121.2 1.7] 
Pe ee ee Ls ee 
[4. 3.73719282 1.51571657] 


johanm eho Galore ii-lonm-hene-e ls) 

[Sia siguet ela acliq ((clelena—1l)y) 

print (np.mean(a, axis=0) ) 
print (a.mean (axis=0(0) ) 

print (np.mean(a, axis=1)) 
print (a.mean (axis=1) ) 

Johann ahem Galorel-lonm—hel@-Wamn-P at 0n 
print (np.median(a, axis=1) ) 
(Segue (Gel aw wacha (epars— 0) elelose— i.) ) 
[Sia aglien teleweclia (cp.Gikici lk eunelelee ak) 


[Sieshigue, (Stone, Siecle s Geilschenl(es elas 0)))) 


OU'TPUT 
Axes 


The functions and methods you've used so far have one optional parameter called axis, which is essential for 
arclate|ilale 24D Me r-\t- bay diomer-lamtcl (ome) aur-lahvme)imial=mce) |e) 71 ae MVcol el stse 


Axis=none: - says to calculate the statistics across all data in the array. The examples above work like this. 
This behavior is often the default in NumPy. 


Axis=0:- says to calculate the statistics across all rows, that is, for each column of the array. This behavior is 
often the default for SciPy statistical functions. 


Axis=1:- says to calculate the statistics across all columns, that is, for each row of the array. 


| A BC 
import math el) a ee a | 
second 2 37 1 
AMM OOie 153 al 
7 [ONE “Sieeie Wisieale S| third — 5 5 
MUNDO WUE ers ial fourth & 27 4 
; ; fifth 16 211 
INNOONEE SiGiIjOw = Siecle Ss A €§.2 
IMO, |Orcliglerehs eis) [olc! BE 68a 

c 1.8 


dtype: ftloate4 


i@ehi iavelileisem || | iealiecis / yy cloclolsien ys elaaligeh &  wejbheela: SV aedicieiaes |) 37.2 
a = 171.2 
eCl memes = [8% , NE Vel c 1.7 
df = pd.DataFrame(a, index=row names, columns=col names) al berschog 
second 2.8 
(ore auioe (ele) aad oi 
fifth 6.8 
jopanmenon Conmeatictohene ples om 
second 1.8 
third 13.8 
[Sia WON 6 WELLE I) fourth 151.@ 
Fifth 75.8 
. . dtype: floate4 
print (df.mean (ax1is=1) ) first 1 
second 2 
third 4 
jolie aLinle (eben wrelie (eps) | fourth g 
fifth 16 
Name: A, dtype: inteé4 
jousaiione. (one Ls ||) b.2 
37. 2Be8R008800001 
oe oe Seo 
jocealione (ele | A! | meta) ) [2 3 1] 
[49 2] 
[e227 4] 
mat [16 1 1]] 
(Oi@aliqne, (tole | + EV" || savee i) } [a4 4] 
[2 3 1] 
print (df.values) ea 
[16 1 1)] 
A B c 
EINE VCE EO. aM OW)! | count 5.00008 5.900088 5.ae000 
mean 6.270000 8.708008 1.80000 
. std 6.89912 11.909087 1.30384 
joka ehon (ehmexol-tronannel-aOm) min 1.a9808 1.900888 1.e9000 
rir 4 2.68000 1.900808 1.e8000 
Beau 4.00008 3.900808 1.88000 
[Su@ilishe (elm eletsvenail eye Ol eesie letkersue@lue 7204 ()), 752% 8.60000 9.900000 2.e9000 
max 16.88000 27.900808 4. e8000 
6.2 
\ohanm chem (ebmrne(—ttonanmel—a| Bly alae Tie ||) 3.8 
Dye\ tela rclattess 


The class DataFrame is one of the fundamental Pandas data types. It's very comfortable to work with because 
him arctswmlclel>) cm celance))ccmr-lalemere)Ulealatce 
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ILM iec, IMMleN, Sites 19S 

LMU oig, SCL. Sissies 

ILMUOCiae jE AS 2S joCl 
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np.random. seed (seed=0) 
Se = ger 6 tecliacloum , iseviareha (IL c/0No))) 
We yO, 1a eliovol@nal 4 ieeiionchan( OG), 


Zi INS) S1e SiMe. 1 eavchal (IL) ) 

iGakee ssc oe (ele bile Jews), 

ax. boxplot((x, y, Zz), vert=False, showmeans=True, meanline=True, 
evoke See PN ve i (Senelela eee ais eee 
MedranDprops— i limewiaeh "27 color ogteen |, 
ite ela cuatejets—| i kaligvenabiehelan) «) 25 ero lileis =v iaevel 2s) 

Oodle 4 Sone 1) 


Ole =AUMi 


jusxr/local/lib/python3.6/dist-packages/numpy/core/ asarray.py:83: VisibleDeprecationWarning: 


Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or- 
tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, 
you must specify 'dtype=object' when creating the ndarray 
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ere slaalLere (Ox, Joclin Sclcjes, Culm lens1 ei aie )) 
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jOdkie. s Slave |) 
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Visualizing Data: Histograms 


Histograms are particularly useful when there are a large number of unique values in a dataset. The histogram 
divides the values from a sorted dataset into intervals, also called bins. Often, all bins are of equal width, 
though this doesn't have to be the case. The values of the lower and upper bounds of a bin are called the bin 
edges. 





import math 

LMSC Siete LScLes 

import numpy as np 

LMOCIN. CaO", Scenes 

import pandas as pd 

import matplotlib.pyplot as plt 


Ke We 4 Ae Mole, ila! 
OMe s Sule s |) 
ax.pie((x, y, Z), labels=('x', ‘y', 'z'), autopcet='%1.2f%%') 


Esl), ELK 


plt.show () 
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Visualizing Data: Pie Charts 


Pie charts represent data with a small 
alUlaaley=iare)im(clel=lism-valeme|iVclamacitclinyis 
frequencies. They work well even with the 
Feley=)ksmsar=lmer-la lm el>me)ce(-la-1e MCL <o male) gallate! 
(olf) eau e)(>mevat- lam \owr- les | cer(- mel hUle(otom aire) 
multiple slices. Each slice corresponds to a 
SJTate|Komellsjialevm clots) micelsamlarcmersitctsy-mr-lare 
larcksur-larclasr- mm ©)K0) ele) ailelarolm(OMiaiomacitclincs 
frequency associated with that label. 
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xX = np.arange (21) 
Y= lp. random. bandint (217) 7saze—2Z 1) 
=a ele ashe Slemynarcheiehan( “ik 


IBskCiA ~“eo< = (ollic. Suilcjolloce (|) 
ax.bar(x, y, yerr=-err) 
eS SIE ox laisse 1h so  ) 
erent wiles ih ye) 
jOilbic 4 Slav |) 


OUTPUT 


Visualizing Data: Pie Charts 


Bar charts also illustrate data that correspond to given labels or discrete numeric values. They can show the 
pairs of data from two datasets. Items of one set are the labels, while the corresponding items of the other are 
lals\i@icsre[el=yalelstoe 
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> ans) Ola war-belel—m @-lap) 

WA ea ee ge Gael alelemie ac evelal (2 iL) 

SLOSS, TMMESIeeCSOL, hp ~~ = SSipov. Stace. Limceqmess (2k, WV) 

line = f'Regression line: y={intercept: .2f}+{slope: .2f}x, r={r:.2f}' 


ieee veb< — Jolla eile ikencs 1) 

ax.plot(x, y, linewidth=0, marker='s', label='Data points') 
ax.plot(x, intercept + slope * x, label=line) 

SPA RISE, Decleor Ik (yo 

epee, Wlkeloeil Ye’ ) 

ax.legend(facecolor='white') 

Iodine 5 Sino) 
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Visualizing Data: X-Y Plots 


The x-y plot or scatter plot represents the pairs of data from two datasets. The horizontal x-axis shows the 
values from the set x, while the vertical y-axis shows the corresponding values from the set y. You can 
0) oie) ar=liVamlareylUle(-mialsmacielasrsts)(e)amllalom-lalemialsmee)acclt-lile)amere\-)iilel(- 1a) 


Data points 
Regression line: y=5.49+2 02x, r=0_.99 
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STO @ ae a SC cl Vee none 

eM Gass eo aioli Sr ai aoe) 

Die usis. iene lellions Jailer.) es vgedlkoic Sens) jsilic 


shea eer ed a ea ON a mee es tf 
Lilia ee ace —— I ee @ Vir, ny eee Ol Gel (Ge eummiie las — 27) 
(RKO “eee 2 Skies soils) eldmeneis (4) 


Bh ae eiged @te@niny; (Gngretie ig sao-<)) 


AX .~Grad (FPalse) 


=. >. a> Pan eh — 8 Oe On a — i  @ On oD En oe On alt NO k— te Gh ake va’) 
Axe Var Ss .~Sseu Ceaecks— (OO, sla, tevekPabels— ("s"; va) 
eles teehee | iba (dl Ase) Cl aD) 
1G ge ak sabia sim cng te pa (eee 
1a A eg ea arab enero Ge) mers 
ae. exis Ga, 912,7) Na cea fa ay i ta Ceonker > va— center” |, “color— "ww ») 
juke salon, () 
(a ee Lo SSS SS SS SS +f 
Mvehie al < gS) Olona ia eleven (<p V4) eae) Ciel Cele ean 
(eNO el 7 Slaw a Sable eens s1i1()) 
asx<. ims how (itie t i as ) 
ase. Gina el Chak se) 
Ax .sMax1sSs. set (tacks—(0O, 1), tCacklabels—(' x", eS Za) 
ax. yax1Ss ~Sete (eEacks—(O, Lb), teacklabels—('=', eau 
e225, 6e lead (iksaS,; =Os >) 
eeu oe esi aie eee Ke pa Gea) 
(mete a) sliahs iaeenere ((20)) 0 
eee Grater a), le, eI eee ro ae il, iat — AC enier Fa BeCetwe rh = Conroe — aw |) 


(okies Sinko 1) 


OUTPUT 


WARE Iv4l atom Bicitch 
(sre ligare) ess 


ANaalsy-lieat-) ¢mer-lam elo Murs\-1e Mm (e 
visually show a matrix. The colors 
represent the numbers or 
elements of the matrix. Heatmaps 
are particularly useful for 
HUSiiee lil ale mualomere)’c-larclalersm-l are 
ofo)dc=iiclile)amantcligteciss 
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PAYTHON« lee 





0 |0O 





E 


Thi Oougie ig\builene eis) igo 

megan, Seale, hide lice est Ihe 

# Array Attributes 

ae npeatray (ils p72 P|) 

[Orie th gues el)) 

orint(a.ndim) # Number of Dimensions. 
print(a.shape) # Verify the shape. 
print(a.size) # Verify the size 


Me ema (oiec velalalel sale ale. ae We, ellasl ye olan) 

#Create a 2D (two-dimensional) NumPy array 
[ove abions, (ul) 

SIeiere (UNE ieleaiiml), Ge IChileesie ear IDS eaLelaS 
print(M.shape) # Verify the shape. 
print(M.size) # Verify the size 


oe ll = ie ei 

# column from a 2D NumPy array 

Jone aliens MeO IL | 

Srint(col.ndim) # Number of Dimensions. 
print(col.shape) # Verify the shape. 
Oorint(col.size) # Verify the size 


Columne— Np rathay (12, 77 5lie.resnape (3) 
icinaene@onlheniay) 

orekigved¢ | DeLee elec = 4-2 cele) lablmiam ale aLi) 

ovemi@e /elereieac! - ele) bing aslacyers)) 

print Suze column size) 





MOOI IMOWMNON, 2S igo) 
amorous Eeaulow,, Ihatineilke; evs. le 
ICOM IMMUNO? > IL aiGeke, TN eONCI. Wile leTeIo< [OOieTe els wher 


fo == ===— Arithmetic Operations 
IM == Mo wetacey Lay Sly iby Sl 1) 


[ove ane, (ul, 
jeneabicne IML a INS siikigtey Il Wicnmieiise Viciasiy) “eleves il dic alenlye 
i= Matrix Multiplication 


print (M @ M) 


#We use the @ operator to do matrix multiplication with NumPy arrays: 


Ns ie cneier Vilp sie Pas a 
jong alae, (Ay 

i = mMoweiewe Lop 2 allie Ali) 
jong alae (155) 

I = np.eye(2) 

jougakave (IL) 

#The finde Identity Matrix of A,B. 
Jonette (Eee Cree Ce) 
{=== Matrix Powers 

Mi ee etcinisa Vail on eal leeoal le) 
print (M) 

[ome ane \ioreny IM, 2): 

[ue aLane (oe IM, Sy: 
print(M@M@€@M @M @ M) 
#Compare with the matrix multiplcation operator 
aaa er linarcbayererci= 

jue shige Nile Ae, 

print (M @ M.T) 

#—-—------ Inverse 

A = np.array([[1,2],[3,4]]) 
print (A) 

print (la.inv (A) ) 

#------- Trace 

johanb chum Galommena-lor-ne-0) 

oat ee Dislachainmetchele 

NS ig@acieiece il 4p lanes) i), 
jonealiae, (Ay 

joxannohon @n-mxel-sen 0-0 > 


OQUTPUT 


W/Fe\tgp.an@) oY=1 el t(e)atwr-l ale im m0 lareit(elats 


Arithmetic array operations +, -, /, “ and ** are performed element wise on NumPy arrays 


ee 


ee 


ei 


[ 


5 
[ 


-809 = 93] 


-889 93]] 
fea 7-1] 
[4 5]] 
[25 17] 
[17 26]] 
ne 

[3 4]] 
[-2. 1. ] 
[1-5 -6.5]] 





[1 2] 
[3 4]] 


-2.@ 





PAYTHON em 10]_)= 


am iholonaiemmnahblilohvamcnsmmaye) 








ICOOIee ME Oo alo jojo es jolie 
imoioee. tsCaljos?  Ihaliovsiilies es) Ie 


N= inegeiccet (| [Lp Ol, (0. =2 11) 
print (A) 


results = la.eig (A) 


jrealane (aeysibilbies) 0) 
Glee (ice sulle s ILI I) 


eigvals, eigvecs = la.eig(A) 
print (eigvals) 


print (eigvecs) 


eigvals = eigvals.real 
print (eigvals) 


lambdal = eigvals[1] 
onanmehon Glmctilelor-mm) 


1 = eigvecs[:,1].reshape (2,1) 
[Ore shine (yl, ) 


print(A @ v1) 


Ole ihiae (Leleglecrek wil) 





IN OCuae. ube, sis. ieie) 
Were, ence lion Intlei, jojo liow ers jolie 
aLameveNers. SCaljon.lLalieller ays Ike 


he = jgjon Gucicay (Op a7 8] ) 
i Ios Co llbiig ieee ih dip lll pope 4 |) 


jSucabiqic (OG) 


We == ino seueiee (7 iy Zl) nieSsineiors 1S 7 IL) 


joueakia\ie, (5) 

a = la.solve (X,y) 

[Sa ge, tie) 

sp ge) yp llaisyercrelsy (U0) pas pn 210) 

vs = ei0) a ellie = eZee 
[Sule ajo Monet KS wis Ase On Sis —=210) 
eee So wee) 

\ 

xX = np.arange (0,N) 


i) = nO 4 ie Shove Ci aeenevelinne (ls 0) IN) 
julie OLE (eee ae YS 
joie sg Slay), 


KM = Wesco limi siecle se Ik wei I< jin ielinoe (10), IN) |) 
foie aerate n( [2 ee) 

) neyo s- 8 ele(- sal ©.@ mm elona-l-t-nmeler. ) 

ve ae rele (Calisto) 

a = la.solve (X,y) 

asp == igyely Ilahiesjorcrers (0) AI 010) 
vs = Sin (lai | sess mene |e aie, teehareis (10). IN)» |) 
[Oulie eI OULO NCIC RAN oY BOS VAS | 

jolie a Slay) 


OU'T PUT 


Linear System: 
mae)\Y/ale)aalrclmn alicia ele)r-lilele 


[[ 1 @ @®] 
[1 3 9] 
[1 8 64]] 
[[6] 
[1] 
ran 
[[ 6. ] 
[-2.36666667] 
[ @.23333333]] 





rf 1 06 68 B86 @] 
ria 1411 4j 
oe ee ee: ee ee 
oe ee ee eres 
eee ee ee eee 
ee ee: ee: ee: ee: 
oe ee ee ee ee 
[14 2 4 8 16] 
[1 3 9 2 B81] 
f 1 4 16 64 256]] 























MMO, MLMNIOY 2s inj 
MMT oowe mercjolioc lilo. vjiolloe evs jolie 
INTOCei Sei. liking lle as Ihe 


aQ = 2 

al = 3 

N = 100 

x == ine) ge siavoloniml. veenayel (INON0)) 

vents) = (0) Ih siger, tecligielemis iechavela | (N00) 
Ve—-sa0ee ala xe neise 


Olbie = SOCK Ec Ene <4) & 
(So big 4 Slavery |) 


O) Oh ig 20 et -ry, system: Fake Noisy Linear Data 


Inyooiar imines sls ino 
mimeo tile lkenelhillc) jeonjelloi eis: jolie 
MOOI, (Houle lskiguelle; eyes: Jel 


aQ = 3 

al = 5 

a2 = 8 

N = 1000 

x = 2*np.random.rand(N) - 1 # Random numbers in the interval (-1,1) 
noise = np.random.randn (N) 

Vee aOg alls atk 7 a Ouse 


Peer ccakterm cy, aloha—Ue oy tw— OE; 
Dlliga Slice): 


©) 6 i hg =2 0 na -*-1, system: Fake Noisy Quadratic Data 





Mpmloenere (Sreliale els’ Eis) [exe 

import numpy as np 

import seaborn as sns #visualisation 

see (lenejolone Iie engeikere cis) jellie =avalse lisence 


# The upload file from your system to use this mathod. 
jen dlp ce renMtstee evoysiva ol\(oe) mona Grojllecleearcnuiee, Olen lalaeioe 

Peon Google, colabwainporteriles 

uploaded = files.upload() 

MPO ciew@ 

Cle = JOC Tiere! Ss Lor evicos lO (lols cle || "Sojese esi. |) 

# Here Soper.csv your file name. 


# To display the top 5 rows 
df.head (5) 

# To display the bottom 5 rows 
Glirereie ec ele(:) 

# Checking the data type 

df.dtypes 

# Total number of rows and columns 
Chmrasyar-hel— 

# Used to count the number of rows before removing the data 
che. Couime ©) 

# Dropping the duplicates 

Cle = Cle sere. Clulol Cel ces 1) 

ohm ek-t-Komeon) 

# Finding the null values. 

Oslin (Che alsin IL () . Sila 

# Dropping the missing values. 

df = df.dropna () 

che ,eroulene (U) 

# After dropping the values 


Jone age, (Vole. se SiBlLIb()) stein) ) 
Cue THI = Cro Selece Ceyoes (ielines = ||Yiikoemet. Yin’ |) 
ie as adc ac Oa a a ew al ae it 


CE TAU sIMLEVE LS Le St O20), Jolidgsa ol, ial lewwo—3, “Vlelss Leis) 3 


# ; avoid having the matplotlib verbose informations 

bisa SS IRC) (SSS SSS Se SS a 

Gone = Cie mumscheoje | Mevee velllvle , ees) secre |), 

# We already examined SalePrice correlations 

plt.figure(figsize=(12, 10)) 

suahses late chemise (leroiene [ieee eS Wee) il tue@eie te 0) 4) | > 
cmap='viridis', vmax=1.0, vmin=-1.0, linewidths=0.1, 
AOI Ieve, BOC IKWS=(S14e 8 Shy, sce lieuSs) 5 


OUTPUT 








saving Soper.csv to Soper (18).csv 


Series reference 
Period 

Data value 
Suppressed 
STATUS 

UNITS 
Magnitude 
Subject 

Group 

Series title 1 
Series title 2 
Series title 3 
Series title 4 
Series title 5 
dtype: inté4 
Series reference 
Period 

Data value 
Suppressed 
STATUS 

UNITS 
Magnitude 
Subject 

Group 

Series title 1 
Series title 2 
Series title 3 
Series title 4 
Series title 5 
dtype: int64 


: 
woo & 


= 
G9 
Pra 


Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable. 


import pandas as pd ¢class ‘pandas.core.frame.DataFrame’ > 
RangeIndex: 1832 entries, @ to 1831 


Diochec. taular as 17 
p Py Pp Data columns (total 14 columns): 


import seaborn as sns #visualisation 


# Column Non-Null Count Dtype 

MMA Wien olorwl wor jowjollow ee ole Glee i Senco a ere eee eee ee 
Lijoowes S@dlseim as) Sine @® Series reference 1832 non-null object 
1 Period 1832 non-null  float64 
2 Data value 1832 non-null  float64 
a aa aa i 3 Suppressed ®@ non-null float64 
# The upload file from your system to use this mathod. 4 STATUS 1832 non-null object 
# It is posible for Colaboratory Online. = nel 1832 non-null object 

. 6 Magnitude 1832 non-null int64 

iE VeroMiy CIO IES -Celkele: ahijoeraic seal Jieys 7 Subject 1832 non-null object 
uploaded = files.upload() 8 Group 1832 non-null object 
9 Series title 1 1832 non-null object 


16 Series title 2? 1832 non-null object 


ci eoua se, be 11 Series title 3 1832 non-null object 


Ole = [SCla See! Oey (Os Byres lO (ilelvoecloc) | “Sicisiaic es |) 12 Series title 4 1832 non-null object 

# Here Soper.csv your file name. 13 Series title 5 @ non-null float64 

[a SR ae fae aT AS oe RN kd arc dtypes: float6é4(4), int64(1), object(9) 
memory usage: 200.5+ KB 

df.head () 

eke -aliahcon() 


df.describe () 


# First this use and then you underline second graph line. 
Sides glolberee lores elie || VWetieel we llie ’ || ) 


# Second this use and then you underline upper first graph line. 
sns.heatmap (df.corr(),annot=True, cmap='coolwarm' ) 


OUTPUT 


(iste WerersiiL/ iLilley TeaClaioins , 16) Culsie—ecle <slcjas ssiecloiommay Clisinicllolic Meme sey oss Ueto ANisie@i@e 3) Calsicell@ic ils 
a deprecated function and will be removed ina future version. Please adapt your code to use 
either ‘displot’ (a figure-level function with similar flexibility) or “histplot° (an axes-level 


le CME MOI Imere Ini Sieerereenls)) - 

warnings.warn (msg, FutureWarning) 
GINECOL VS sepsis SUICIOINOES -Hxescuicjo lOc ele Corse sla Ves = 
Regression Analysis 


micoml SleikQencig.CleiseScies! smloo re mela lolol 
delice. = mele lololos (i) Seimloles=200, im weeiruices=7 , 

eSinpsres=", Clues Sica ll .8, iene sireiwe= LU) 
(Olkien sxechee cue (elenecy (Ou | sO elenorey (00) oak pre —elcheeiihil)| emcie— ! techaliglete nin) 


<matplotlib.collections.PathCollection at Ox?7fecSeste7i1o> 


from sklearn.cluster import KMeans 
MIMSeiSs = Kune in Cl Usiceies=4) 
kmeans.fit (data[0]) 


anerehiqus) sel bensig elSigicenats) 





Mees) 4 eos) 


bye (ana x” i —soOlen culo plots (yelp snare y—-lEwe yr tos i7 eal Uy 6) 
axl. Se ticle ik weeime ’ | 
aris ee nie | cleice (0) 270) -cetce (0) a, lL), C= nmSeims « leloels - cilejo= Vice timlocn ” ) 
ake Sele eile le Ore ime) 
psiereu ee cue ((elshere: 0) | 8) |Faelenercy (6) 1 te Ih ie — elec ye ile ol uareabeleseyini)) 


ae 
@ | | T' Pp | | <¢matplotlib.collections.PathCollection at @x7fecdbe?7320> 

K Means Original 
K Means 


Clustering 





K-NEAREST NEIGHBORS IN SCIKIT-LEARN. 


from sklearn import datasets 
import pandas as pd 
Limjocie ier lkoellalo jovjelor eS jollic 


# loading [Ris dataset Erom scikit—-learn Obyvect Into iris Variable, 
Tigi; = Cleibe SSCS 4 Ciel iets) |) 


# Prints the type/type object of iris 
[Oe mbige ew ere) alte aks))) 
wo elleieis Welk inScucd oeireeo es. Sean. Buinelay = 


Fe OLIncS Ene CiLGbioOnary keys Of tras edata 
Pel ie ard se keys )e) 


# prints the type/type object of given attributes 
[Sealine (VeWjoe akiours 6 Claes), lw iyOle | shigihs sie e ctofee )) )) 


# prints the no of rows and columns in the dataset 
print (iris.data.shape) 


# prints the target set of the data 
[SISALINE ALS EEC mello } 


# Load iris training dataset 
ME=strrs. data 


# Load iris target set 
Y = iris.target 


# Convert datasets' type into dataframe 
Che = JoCla Decale eine (0K, Colum s= ies FSeoUIeS Ilennes ) 


# Print the first five tuples of dataframe. 
Johan ehom Conmpusl-r-lon@l) 


T! p T <class ‘sklearn.utils.Bunch'> 
dict keys([ ‘data’, ‘target’, ‘target_names’, “DESCR’, ‘feature_names’, ‘filename’ ]) 


<class ‘numpy.ndarray'> <class ‘numpy.ndarray' > 
(150, 4) 
er ee ee) ee ee | 
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 


8 pe! Ee 1.4 8.2 
1 4.9 3.8 1.4 B.2 
2 4./ 4.2 a De 8.2 
3 4.6 ao a ee 8.2 
4 5.8 4.6 1.4 8.2 





WIMEIRAAPSIE Nid 
WINISIW IB RAAPSIED 
BEVANRINTINGG 


PAY THON ox 2o1@) Bb) = 


from sklearn import datasets 








from sklearn.neighbors import KNeighborsClassifier 


# Load iris dataset from sklearn 
Lens = Cleicesiots «lee, ems 1) 


# Declare an of the KNN classifier class with the value with neighbors. 
Kian = KNieuelalsowsc Lees iliteic (1 imeieioerrs= (9) 


# Fit the model with training data and target values 
agig Gee (ares) oleicel 7 des | Picea * |) 


# Provide data whose class labels are to be predicted 
x = || 

[ere coins a ee o lbemeiines ly 

Be 2a Ie ee 


# Prints the data provided 
(le LIME (OC) 


# Store predicted class labels of X 
prediction = knn.predict (X) 


# Prints the predicted class labels of X 
joke img (Feuseelie) cei) | 





reo shelisreren Wimmer Ceiccisions,; I ikmiseie jocks ll 
mgooec wreicjolMerellLalesjexjolet, evs je llic 


TIMOR MNENNOS: eS) iayo 


# Load the diabetes dataset 
ch lores Cleeelseie sq lk@eiel Culelocecs (() 


# Use only one feature for training 
Chia eiweom . diabetes.datal[:, 


np.newaxis, 2] 
# Split the data into training/testing sets 
Chiao cies S| 3210) 
CMieioc ces S405. 


CULES SS OX 1eteei a 
Cl eI OC Fest 


# Split the targets into training/testing sets 
diabetes.target[:-20] 
diabetes.target[-20:] 


Cees V7 tee 


Cues 7 ESSir 


# Create linear regression object 


iecie = Ikiinveeve jimlocle ls Int ie ie NSS ees SL Cm) 

# Train the model using the training sets 
iS ove nItate (chelsea s OC iesiain, tclleloacas Wy Telcel iia), 
# Input data 

[Sua abiaed( y lise V cullibvers: 2) 

[Sue mishe (Cheeses OC esse 


# Make predictions using the testing set 


CLAS ESS V7 ies IGS CIS 6 OIC SICMENE iolmeIOIScSS Sse), 
# Predicted Data 

[Sa abiquen(isaqieanele ol Oliaeisnes feu bio si) 

SISTINE (CULAIOSIESS 7 jomocl) 


# Plot outputs 
Lie a SCENE Coie | Cl LeloSices) OS ESS ce, 
(Le selene WolhelOSeas < rSsic, 


CBOSS Ay SSic,, 
STAEISSESS Wr joeSol, 


[Sue a Slaven): 


OUTPUT 


color='red', 


Input Values 
[[ @.07786339] 
[-0.03961813] 
[ @.01103904] 
[-8.04069594] 
[ -@.03422987] 
[ @.08564998] 
[ @.08864151] 
[-@.03315126] 
[-0.05686312] 
[-0.03099563] 
[ @.05522933] 
[-8.06009656] 
[ @.00133873] 
[-0.02345095 ] 
[-0.07416811] 
[ @.01966154] 
[-0.01590626] 
[-0.01590626] 
[ @.03906215] 
[-@.0730303 ]] 
Predicted Output Values 
[225.9732401 115.74763374 163.27610621 114.73638965 120.80385422 
158.21988574 236.08568105 121.81509832 99.56772822 123.83/758651 
204.73711411 = 96.53399594 154.17490936 130.9162951/ 83.38/8227 
171.36685897 137.99500384 137.99500384 189.56845268 84.3990668 | 





0.08 


0.08 —-0.06 —-0.04 —-0.02 


000 002 004 006 


Ccolor="black") 


linewidth=1) 


IMPLEMENTATION OF LINEAR REGRESSION IN SCIKIT-LEARN 


# Importing Modules 

eel, VOmMjony a Slblsieisie 4 lgalicvecioeliye aniieienec Mie <sicis “eloielehae) crs cial 
qlee, sits ej Oltonc Jalon, jo follow eis: seule 

Uj(eeeiee, oysters! cis jel 


# Reading the DataFrame 
SSCS cle = joc. eSecl sy | 
"https://raw.githubusercontent.com/vihar/unsupervised-learning-with-python/master/seeds- 


likes Sic Os -ese) )) 


# Remove the grain species from the DataFrame, save for later 
Weuolecies = Jas l(Sseeclks Clit joCiol ’ ice Wwelemey’ ) y 


# Extract the measurements as a NumPy array 
Seumoles = SeSecis Cle avelles 


Woy vy 


Perform hierarchical clustering on samples uSing the 
linkage() function with the method='complete' keyword argument. 
Assign the result Lo merogings. 


Woy vy 


mergings = linkage(samples, method='complete') 


Woy vy 


Jelkone se el=eVelatoroimclai Whoakignen elas el=vaielaeepacii(( | imillelele skein terqiiievate avers 
S/S WIE mine) TEINS Ie vaiierecl cigeideec lelee le —welieMere mee), eis mCi ele em 20), 
euncl (Sent TROIS 4eas - 
ok=pelehalolepar-biim ornare mm elense 
labels=varieties, 
Iga IOCee LOM 0, 
len OMe SiZoSG , 


) 


ellie 5 Slaven) 


OUTPUT 
HIERARCHICAL CLUSTERING 


As its name implies, hierarchical clustering is an algorithm that builds a hierarchy of clusters. This algorithm 
begins with all the data assigned to a cluster, then the two closest clusters are joined into the same cluster. 
The algorithm ends when only a single cluster is left. 





ats mexe)an))(>1i(e)ame)imal(=1¢-|neva) (er>|ier10\-)>1a | ale Mer-|amOl>m~)ale)) murs) [ale mel-)ale|aele]e-lpamm\(o)wam(=1 cm (le) @- |= lam) c-100]0)(-me) i 
alKe)cclaesal(ercl mes [Ulsi(cl al are mel ale mel e-llamercirce 


# Importing Modules 

from sklearn import datasets 

eu Sleikeveia\ wie iginiee ol aiierorcie AS ia: 
imjoenec, WercjodbOclalo jojo cis jollic 


# Loading dataset 
Lies che = clereasers 4 Loec rie () 


# Defining Model 
WVOChe IL = ASIN ee vemiae, tee eS 00) 


# Fitting Model 
Elec roremecl = iierclo iL ikIse emesis orain (a ieiS (ole -icleice\) 


# Plotting 2d t-Sne 
x ESS = tices Oimmocl|e, 10 


V Exe = Cemerocmsc| se, I 


(Les SC Cee (Ok EOxS, W ES, C= 1S Cle. telegec) 
lke . Sino |) 


OUTPUT 





T-SNE CLUSTERING 


@Jal= Mo) alo muralsie) els )aslsxovem (sr-laallale maelsliarerelsmie)mnutslur-liy2-1i(e)a mim eel (sJigle)U) (orem) (eear-ts)i(omalsy(@]alele)mm>)anlelsxe(e|| are me) a te 
SN] eam bm apt) ekom alle] abcellaal=)arsi(e)arclmes) er-ler-m[aice m= lmao me) aulale\oncellaal=yalcy(e)ars lies) ey-(e>m\al (eva mer-lamialslamelomvlsier-l|7421e p 

ws) ol =e1] | (er= 11 Nam] mua alole(=ism=r- (eval alle|ancel|aat=yarcy(e)ar=lme)0)(-1e1m o)’ar- Wn Onmelaniald{-ncel|anl=valcy(e)ar-lm ele)ialmlamcieleamr- mc hymtarcl 
J]aal i tlane) e)[=Ye1 towel com paolo [s](-10m oan alst= 18 0)’m ele) [aliswr- lace me|tsts)ianlits|@e)e)(2\eltomol acm palele(c)(-em e)\melicir-laim ele) /alesmuvivemariela 
fe)ge)ex-le)i lias 


# Importing Modules 

ree Skieemi.cecesions jmoore Ikoecl mice 
MNO we, Ieee lore lLalleysjonjelow ss jollic 

from sklearn.cluster import DBSCAN 
from sklearn.decomposition import PCA 


a ieee! Wyeloehcl= © 
Wigs = loercl iis |) 
# Declaring Model 
dboscan = DBSCAN () 
jig MELE IO Te elke 


ClOSCEa ene (aie Sha elence)) 


# Transoring Using PCA 

pca = 

ICa, Zcl = JIGS < EIcsiNS FO rei | Wie ae). Clee) ) 

# Plot based on Class 

iPOle Gh iil weno, wee ZclsselieisS: | 0): 
it Close. lalosis |i) 


Gl = ple .Scerus« loca Zolli, Ol. 
elie Closcem, laloeile |i) == 
62 = ple.Seeccerloce Zolli, 0), 
ela closcan.telosis la) == —Il: 
GS = Plrcascercer loca. 42adli, Wl), 
[Sukie olkevorSinvel Cells seve sesilke Gibbs qecne i 
(obese ley DIE SIOUNN serhiglels. 4 ei llibhsicenacsy, lial 


Saeronae) 


OUTPUT 


DBSCAN CLUSTERING 


ICA COMmNOOMeIVes=Z) oi i'r Lies -Cleuce) | 


pea 2d[i, 1], c='r', marker='+"') 
pce 2cllin Lil, C="G pp Weriker="o" ) 
WcCamZcliyael pC vO aman ere— 5 5) 


elisha sen 27 
Noise") 


"Noise']) 


DBSCAN finds ? clusters and Noise 


Cluster 1 
Cluster 2 
Noise 





Density-based spatial clustering of applications with noise, or DBSCAN, is a popular clustering algorithm used 
as a replacement for k-means in predictive analytics. To run it doesn't require an input for the number of 
clusters but it does need to tune two other parameters. 


import pandas as pd 

IMTOe ac MUM sis iO 

imo, Wercjeikencdhalo.,jonjoioic ers: jollic 

import seaborn as sns 

irom Iiewreliom sekues hen, Wnleronec chisioley 

ieee Mle velNeyel, eiemil aise yyvolekeiies 

FCM IOVWICIISES TMOG LNCS, Lites tcCtiL We, Miovecl, IWinteiceci. erred 


# The upload file from your system to use this mathod. 
# It 1s posible for Colaboratory Online. 

from google.colab import files 

uploaded = files.upload() 


TAMOSICTE SLC 
Our = [eels ieSeicl Caw Op vines lO (Ujclloeiclec|| “ceics aes |) 
# Here Soper.csv your file name. 


width = 12 

height = 10 

plt.figure (figsize=(width, height) ) 

sns.regplot (x="Engine Information.Engine Statistics.Horsepower", y="Engine Information. Engine 
tatistics. lorgue”, data—dt ) 

# here your choice to set the [ x and y values] 

(ovlae yale Oye) 


OUTPUT 


Regression Plot 


When it comes to simple linear regression, an excellent way 
to visualize the fit of our model is by using regression plots. 
Bi alisw o)(o) M711 tsiate)' ar mexe) anle)iar-lile)ame)m-mcxer-li(-1¢-1e mer- itl ele) ales 
(a scatter plot), as well as the fitted linear regression line 
ofe)i ae ialcelere lam tars mels icc lam Malicmul|Me|hV{MO lowe lmaor- sxe) ar-le) (> 
estimate of the relationship between the two variables, 
iNalsmcia)ale|iame)inialomere)ac>)t-11(e)apmrctom\U>)| motsm calomel |aovelilole 
(positive or negative correlation). 





No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable. 
Saving cars.csv to cars (1).csv 
(9.8, 807.8) 


TMJOOIE jOcwMoles eS jee 

Te MOCISe, Mba Els ieys 

geese. ime cjellorcllale,jonyolloe ek: olke 
import seaborn as sns 

mom LeweMom,cusoleny amjcome Casjelleay 
from IPython.html import widgets 


COUN GIOVE LS MOC’ LMSC h, MeSeaCriWe, itio<ecl, IiMccicercr lene 


# The upload file from your system to use this mathod. 
fPeltiy ds. PCsibilennoreColoporarery sonic. 

from google.colab import files 

uploaded = files.upload() 


LOO, abe 
ole = [Gls iiSetel SW (ke eyes LO (Molloacimel|| “Ceucs sie” ||) ) 
# Here Soper.csv your file name. 


Spittal heehee Netra de ea eV cee acl chr eden ae echo it 
Chupeel-t-Kone, 

i a ge Ce 0 ON aed a er ein it 

width = 12 

height = 10 


plt.figure (figsize=(width, height) ) 
sns.regplot (x="Engine Information.Engine Statistics.Horsepower", y="Dimensions.Length", data= 


) 


# here your choice to set the [ x and y values] 
jou a war 0, ¢ 


QUT PUT 


Regression Plot 





No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable. 
Saving cars.csv to cars (1).csv 
(@.0, 807.8) 


imi eoue ie) yoveliavelever yale" re! 

MOONE UMNO Els iO 

aimee, ileejolkoeihillo) josjeiliore <eis’ jollic 

import seaborn as sns 

imeoml Ilewielicm seuss levy Wnleeiec casio le ny 

i cOun Ivete, lnem@l, iMigisenec walckorues 

RCI TOVWICIIS ES TMOG LNCS, TiteteCtiweS, Kiovecl, Miers. ere 


# The upload file from your system to use this mathod. 
flee bs epOstbpile  toryeolaboratery. Online, 

from google.colab import files 

uploaded = files.upload() 


LIMOS AVC 
OUR = Cl Sel Cay (1O. Byioes le (Uelkoecloc || “ces .eGew |) 
# Here Soper.csv your file name. 


width = 12 

height = 10 

plt.figure (figsize=(width, height) ) 

Snes. ces dolor (dit Engine. in rOrmakt One PnGimee swale lt Sel eS Horsepower), om |) Dimensions, Leng eh |) 
folie stSilavente |) 


OUTPUT 


Residual Plot 


The difference between the observed value (y) 
reValomialom e)ecvol(e](=1emVc-110(om @aatcl0momersl|(sremial= 
residual (e). When we look at a regression plot, 
the residual is the distance from the data point 
to the fitted regression line. 





No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable. 
Saving cars.csv to cars (1).csv 
(9.0, 807.8) 


Web scraping from Wikipedia 


Requests: It is an efficient HTTP library used for accessing web pages. Urlib3: It is used for retrieving data from 
URLs. Selenium: It is an open-source automated testing suite for web applications across different browsers and 
platforms 


Requests library 


# import required modules 
import requests 


# get URL 
page = requests.get ("https://en.wikipedia.org/wiki/Main Page") 


# display status code 
joie ale, (\eelofSa Sicene ls) COCs) 


# display scrapped data 
johanhehom Ger-le(-wuereselm soles) 


OUT PUT 200 of <!DOCTYPE html> (Some HTML links open here) 


SIcFlV IU Morell] OMe)m ereleomerclesiare 


# import required modules 
from bs4 import BeautifulSoup 
import requests 


# get URL 
page = requests.get ("https://en.wikipedia.org/wiki/Main Page") 


# scrape webpage 
soup = BeautifulSoup(page.content, 'html.parser') 


# display scrapped data 
print (soup.prettify ()) 


QUT PUL <!DOCTYPE html> (Some HTML links open here) 


D)felellateMel=teyomlal(om=ferclel(ielmerele) omivlatater 


from bs4 import BeautifulSoup 
import requests 


# get URL 
page = requests.get ("https://en.wikipedia.org/wiki/Main Page") 


# scrape webpage 
soup = BeautifulSoup(page.content, 'html.parser') 


list (soup.children) 


# find all occurance of p in HTML 
# includes HTML tags 
OMeMIME SOUS «eave! ellie”) 


ener (ie an!) 


# return only text 
# does not include HTML tags 
ILO (SOUS. mime! eo" ji (ages “Hexce |) | 


QUT PUT 


Margaret (born 1991) is a Polish singer and songwriter. Before her mainstream debut, she 
performed with underground bands, recorded soundtracks for television commercials and produced 
a fashion blog. Through her blogging, she was discovered by music manager Stawomir Berdowski 
and signed by the record label Extensive Music. Margaret gained international recognition with 
her singles "Thank You Very Much" (2013) and "Cool Me Down" (2016), the first of which was 
included on her first extended play (EP) All I Need, and charted in several European countries. 
In 2014, she released her debut studio album Add the Blonde, which reached the top ten in the 
Polish charts. By re-releasing it in 2016, Margaret had her first Polish top five and Sweden- 
charting single with "Cool Me Down". In 2015, she recorded a collaborative jazz album with Matt 
Dusk titled Just the Two of Us. Her third studio album, Monkey Business (2017), became her 
second top-ten album in Poland. (This article is part of a featured topic: Overview of 
Margaret.) 


Exploring page structure 


For example, the element with id mp-left is the parent element and Its nested children have the class mp-h2. So 
we will print the information with the first nested child and prettity it using the prettity() function. 


# import required modules 
from bs4 import BeautifulSoup 
import requests 


# get URL 
page = requests.get ("https://en.wikipedia.org/wiki/Main Page") 


# scrape webpage 
soup = BeautifulSoup(page.content, ‘html.parser') 


# create object 
object = soup.find(id="mp-left") 


# find tags 
UCN = COTSC corm GULL Cleves "nese." 
result = items[0] 


# display tags 
fore inne (SSW CSIC LIEW | | 


OUTPUT 


<h2 class="mp-h2" id="mp-tfa-h2"> 
geen We "Weel OC. 2s weele Masel eueewelie’ 
</span> 
<span class="mw-headline" 1d="From today's featured article"> 
From today's featured article 
</span> 
</h2> 


Regular Expression 


A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check tf 
a particular string matches a given regular expression (or if a given regular expression matches a particular string, 
Vial (evamere)patsrsmole)w/amcomlalcmstelaelomialiare pe 


import re 
m = re.search(' (?<=abc)def', 'abcdef') 


My Ca Ouou( 0) 


OQUTPUT «er 


import re 
m = re.search(r'(?<=-)\wt', 'spam-egg') 
Mere pate oyen((0)), 


OUTPUT 


import re 

Blaise), Sjollilin Ge hey “WicieC Se ious. “iene, ©) 
lopigtarn se labie tesa) 8 iioyatelss “wierecian wenselsi, + | 
@—re, colin r Wee, es WOGdS Words, 5 wOLnds .. 7m.) 


( 
( 
( 
Omieojceulbe (ie! Weel, Villorcle, “worecle. “orci. yp IL) 
eae sollabmice (Nee YS A iolclie eucles 4 5 7) 
Ge so dine (Gee ina nee beeen Otc Gis cee ce 
neigh csyolabied ies MOU ev Seo niCheels 3 ares |, 
[Oneabane lish Gah ere vo Vdine ore Wamome ar igh ree ghee me Cena g 
[‘Words’, ‘words’, ‘words’, ‘‘] 

6) Mi Pl | ["Words', *, °, ‘words’, °, °, ‘words’, ".°, ‘‘] 

[‘Words', ‘words, words." ] 

[‘Words’, ‘words, words. ] 

ieee vee “words, “ef en “words, eee ""y 

Bae “words”, ""y 

fas ee ee “words, ane tie "" | 

fusr/1lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match. 

return compile(pattern, flags).split(string, maxsplit) 
import re 
TW STO Glance ee City lacey 70cm aml chain 7 Oe eo ala or l ec tay 

OU Sesh eke 1 (Cleq) Seren viaje Vil (vzeaey an) 
"def myfunc():') 


O 0 LT i 0) LT “Static PyObject™\npy_myfunc(void)\n{_ 


smilelohaumna— 
LMMOOIa |S Sue se skiaKey 
print (re.escape ('http://www.python.org') ) 


lege Cligues = Secu .escia oOnweeecese «> srereinie) scluugunies a lie ee apa Poel 
[Oe mIMve (! |loela’ a me, eecese (lecel, Sligies) ) 

operators = evar ae eet ye Dee || 

print('|'.join(map(re.escape, sorted(operators, reverse=True)))) 


Clagabies tee > ie! Vcr! 
sample = '/usr/sbin/sendmail - O errors, 12 warnings' 
(ueTIoNe ISS, Sle Uellcalics jes, Cligiics wesmeollace(  \', 1 XY), SemmollSs) ) 


http\:\/\/www\ . python) .org 
C) U |! p U ak [abcdefghijklmnopqrstuvwxyz0123456789\ ! \A\SN\ANEN VA. VV A es J+ 
V/A [Ne] AFNE 


fusr/sbin/sendmail - \d+ errors, \d+ warnings 


import re 

MMOs Mihenclal 

Moron Maem Vit ee Wil on ho acie wi Newb @M, me plinc eats ita.) 
[Sue IGVie (iad epee en 1C0))) 

[ieaine (am eeewye. (ils 22) ) 


Me — ron Mat em (ie tO ee We) 2, Saco uNews OM, splays Weis te) 
[Sacahigvie: (en (Ol) 
[iets (et JE] )) 
joe aioe (aml 2 |) 


th = 1S ieneela (se Pe iciscse semis Wyn) | Pe ibesie. iene Vis) 7 “Mielke @ Jim Ie arelleis /) 
[De TONE, Vata te pero One atigSIe idvehihey ))) 


[See TIVE (ile CpeouUjo ly Levee eveie, ! ))), 


inh = lever simleneein(e WOE) a IC\Cae yl, Mee Gee) 
[SveaLiovre (il eCpecowios (| 


ig = 1S we eela Wie! (ON cIs Bel) Pw ae 


[ove abioc, (Oil emeonors: (CO) ) # Second group defaults to None. 
th = 16S iene Ola we Ce iessesie eigen) (Zi beisic, tema ina) 7 “Menlo @llm vex aorelkes 
[Suerte (ithe, Cuctouroclierc. (1) )) Isaac Newton 
(‘Isaac’, ‘Newton’ ) 
=\4 thoi el re ol eheuDlolel.¢o dersiClolelemmere)i iin Isaac Newton 
m= re.search("John", email) Isaac 
print (email |/sm.start() |] + email [m.end(): })) Newton 


OUTPUT are 


| ee 

('24°, None) 

{‘first_ name’: ‘Malcolm’, ‘last name’: ‘Reynolds*‘} 
DonRakha@don. com 





TT 
RIESHUNPEIIN 


PAY THON = CODE 















import pandas as pd 

import numpy as np 

import seaborn as sns #visualisation 

import matplotlib.pyplot as plt #visualisation 
import seaborn as sns 


# The upload file from your system to use this mathod. 
# It is posible for Colaboratory Online. 

from google.colab import files 

uploaded = files.upload() 


import io 
iso Ome CmOciplOnpiwes NOs INO @cice Cl ae@alnste iaalle) 
# Here Soper.csv your file name. 


# it was print the first 5-rows 
Chmuslsvolon@ 





UIT ale Ms) r=(e1,.c@maalsiiareye, 


Stack method works with the Multilndex objects in DataFrame, it returning a DataFrame with an index with a new 


inner-most level of row labels. It changes the wide table to a long table. 


Melee, (ecliolereis ls joel 

Laeteeies igubiqewe sels) iake 

import seaborn as sns #visualisation 

I OCiMe, airehe edhe above jolene, gas) jolla = 71 sible ab Savieaberal 
import seaborn as sns 


# The upload file from your system to use this mathod. 
ig iGee sits ou cleonl enka mene AG@llclsiena-lmoney sf Clan iikarr 

jE carey) Keperere deere) keller abi exee eae ess 

uploaded = files.upload () 


rai @Xene ee ae) 
Gla = joel. i@Serel CS y (100; Is ics ILO) | Uo lleeiclee! || Cees esy |). ) 
# Here Soper.csv your file name. 


# reshape the dataframe using stack() method 
Che Sreclecl = Cis Sicecic() 


Dietioue (elie sieeve | <Sel. ool (2:5) ) 


OUTPUT 


Dimensions.Height 
Dimensions. Length 
Dimensions.Width 
ele pm el— mae Bebmona it. henmele@e Dhannyc—-o mamel— 


140 
143 
Ow 


All-wheel drive 


elem mel— me bebmonalit- henmesee-lselenmel—mu nygel— He lib, Gee 2 ily Gy te vllalevcliote Z silo. 2 Siler elles 


Eine igk= Iie @reiene alee. aly oie el 

Engine Information.Number of Forward Gears 
ine ios Ia Gate ea Oie) . Wiecuasiia. SS aL Orel 

Musik ieee ete ere. Caley ice 

Fuel Information.Fuel Type 

Fuel Information.Highway mpg 

lbekeu@ical ie gerene devel tO bel\isis ike 1Xeeke abone! 


True 


6 


6 Speed Automatic Select Shift 


ike 


Gasoline 


Vig: 


 MOLEOMMsheaxe) 1elarshele(tulssulenal 


EGehnrtlritear lone ED 

Tdentification.Make 
iKe(~sehenmanmor-henmeleneUlorel= 
IKe(~sehenmanmor-henmelenn4st-na 


1 Year 


2009 Audi 7 A3 3.2 
Audi 
2009 Audi A3 
PONO'S) 


Engine Information. Engine Statistics.Horsepower 
Eine hits Ieee ratieneskeig) serena Siecieasie les 4 IMenaebls 
1 Dimensions.Height 
Dimensions. Length 
Dimensions.Width 
elem melee bebmonaiit-henkeseneDhaninys—- annie 


443/10 
Zole 
140 
143 
Ly OW 


Front-wheel drive 


eke pela sab monat-nonmelene jialemmal—muuinygel— Morel. 2 Oli 4b len Lanaveleie Zi) joj) 27 sees Wun 


nS fkiok=S Mae Creiiye 1c aL Gia) a lal Sue acl 
Engine Information.Number of Forward Gears 
igo abigi= Imre naniene abreigyy whie clavsials tsb eral 

ohuay4el—- eae) Oi -lone 


True 


6 


6 Speed Automatic Select Shift 








RUEPSi 


PAY THON = 





pandas as pd 

numpy as np 

seaborn as sns #visualisation 

Nae plOkiibe py OlCradss OhEnT viclcicaalon 
seaborn as sns 


# The upload file from your system to use this mathod. 
# It is posible for Colaboratory Online. 

from google.colab import files 

uploaded = files.upload() 


TNO, ae 
cht = jocl weecl CSy (lO, erresle (mjollosicles | Cencs ear | 
# Here Soper.csv your file name. 


# unstack() method 
Ole Wile elec, = Gir Seechecl UiNSicae < (| 
df unstacked.head (10) 


CODE 





UFSiialemaatslin@manrsiialere 


Melt in pandas reshape dataframe from wide format to long format. It uses the “id_vars[‘col_names’]’ for melt the 
oFeitc lies tanto e)’mere) 6 lanlamarclanictse 


iieieuais. jersliqlelels; els: joerc! 

ING es IU? aS iGo 

import seaborn as sns #visualisation 

INeCHae iikevejoullone IlsLle) Aen geodon, velis) jollc Gj weiss brea sys ies tena 


WMO. SSelecicn 2S Sins 


# The upload file from your system to use this mathod. 
Ha dl ee ds jevesalle I= mere M@elicloeisenmieie vA Olakivar= 5 

PeOM. Google collab eamporrnrrles 

uploaded = files.upload() 


aine@ione. ale 
che = jee. sete! eS (ale ees sO) (Uo Laie acl» Gees ses ll) 
# Here Soper.csv your file name. 


# it takes two columns "Name™ and "Team" 
06 JIN Iie. = elie. — vars =—=["Bngine Information. Driveline’, "Fuel Information. Fuel Type" ]) 
Cue Mees INSeNC! | 


OUTPUT 


Engine Information.Driveline Fuel Information.Fuel Type variable value 
0 All-wheel drive (CF-TsYo)|/al> mmm D) lantslalsice)atsm m(=yie lal! 140 
1 Front-wheel drive (CF TXo) if al-mumm D) lantsvalsice)alsmm(syle lal! 140 
2 Front-wheel drive (FTX) il al-mumm D) lantsvalsice)alsmm(sile lal! 140 
3 AN ESV Valsxs) melanie (CF TsYo)|[al> mam D) lantslalsile)atsm m(=yie lal! 140 
4 PANES Valsxoi mela hvis (CF-TcYo)|/al> mums D) lantslalsile)atom m(=yie lal 140 
5 PAN EV Valois) me lahics (CF ToYo)|/al> eam D) lantslalsile)atsm m=) ie) al 91 
6 AN EV Vialsxs) melanie) (CF TcYo)|/al> mums BD) lantslalsile)arom ales (eal! 91 
7 All-wheel drive Gasoline Dimensions.Height 201 
8 PAN TEAW/al=X>) me |aiVis Gasoline Dimensions.Height 201 
) PAN ESV Valsxoimelahvis (© F-TcYo)|fal> mms D) lantslalsie)atsm mle) ie lal! 147 


Decision Tree Classifier 


The simplest way to visualize the decision tree classifier is to see it as a binary tree. In every root and internal 
node, a question Is raised and then data on the node will be split based on their features. Let's take an example of 
training a classifier in Scikit-learn. 


from sklearn import datasets 

from sklearn.metrics import confusion matrix 

i fOMes ele ciei MOCCImcc LoCo Me IMP Cie mica iiec stays Olah 
Wea! = OlICeISIES - NOele. ene 


A) = Ieee ceca 


oO 
l| 


iris.target 
4 eeu, 2 CSS, I tke, Io cesr = weal rest Solicits, lo, weiclom Scatce = 0) 


fOTOTLIME EL TeIGelLM, YM" ,O aeseelaLa 
Oreste el ISIE, | Var! 8 CSE) 


from sklearn.tree import DecisionTreeClassifier 
cheeses mec = WecisemileseC lassiiiwere ines else = 2)) ele eicele, 19) ieicelail) 
lees “SSCL eeMs = CleeSS imOCls. pees (al Ser), 
C= COMMS IWeleiciox 0) WEIE, GIS jOMScLe coms 


joxeaLione, (ic) 


OUTPUT show some matrixes 


ajmeoneans, jollel 

laulhicl< sekonaglueysie ((Yloneeniigh! ) 

ireow link .eigebls almeeiocn loicenng 
loneCinia -yrenecls () "2 ItSietueee se, lacie oie 
len (brown.words()) # No. of words 


brown.sents() # Returns a list of 


Imani, Sates Geiileics="ca0il?) 4) ‘eu 


Strings 


in the corpus 


Iie eon fSiojia saves 


can access a specific file with ~fileids- 


len(brown.fileids()) # 500 sources, each file is a source. 


[one abique (enclowiet are sks cls (9) | PLO | ae Wiest I0N6) sreiiiccers, 


[oie tigue. (omOnia wieevr VelOi Siem) eOec |) se imisecic IWNOOOL elactaeeesics- 


aLoumeme:. 











E 


PAYTHOWN 


ECO ae, slinjereuere “Sie -ealin oi 
incom Si MSeiem. KSeiess: SX LAE Ste. Willson: Coble VSeIor Wer 


"The guick brown fox jumps over the lazy brown dog." 
"Mr brown jumps over the lazy fox." 


with StringIO('\n'.join([sentl, sent2])) as 
# Create the vectorizer 
CONN WSC = Coble Seeoie lea), 
GONME WIC E STI Te rcelahsinenadl [cts 


# We can check the vocabulary in our vectorizer 
# It's a dictionary where the words are the keys and 
# The values are the IDs given to each word. 


COUN WECL. wocelelleiy 











import pandas as pd 

import numpy as np 

import seaborn as sns #visualisation 

import matplotlib.pyplot as plt #visualisation 
import seaborn as sns 

ciel SkIsced moot SSS wS i WMNeIIe ikem Sse Sella 


# The upload file from your system to use this mathod. 
# It is posible for Colaboratory Online. 

from google.colab import files 

uploaded = files.upload() 


HNO Olas wel © 
che = jotl.teeetel Sw (Os siicoe NO (Wye leche! ||» Soja sy |) 
# Here Soper.csv your file name. 





Preprocessing and Train Test Split 


ingnercuee Teyeneveles) sis) excl 

import numpy as np 

import seaborn as sns #visualisation 

LM Covet. Wie Wolke lilo jeniolone eis jolie {eerste sisi ern 
import seaborn as sns 

ie Oml Shelkoewci mec Select Wom Noe c eee ie ieSsie Syolaie 


# The upload file from your system to use this mathod. 
jig dhley abs ees Lledle Jeena (relllel oienesiwienes 2 Ole igter 

from google.colab import files 

uploaded = files.upload () 


TIM OCNe1e, “ALe) 


GUE = Gr ieecl Cis LO. Byres UO (molmeclocl|| sce eer” |) 
# Here Soper.csv your file name. 


df.head() 


elie aLdvee | Sl) svelones 
y = df.iloc[:, 4].values 


IIa, Of SSL, VW lice, 7 ESSU = Cieeam west SOLVE, VW, CES Size. 20) 
foue mie (VS teaeeviial) 


ee 


MEDICOM ST sleepy! CAL ea 0S) SS Sie. Gag 4 EnelOieeleaelg elayel Iiecvuieaiigicy 
nes prices' 'Unadjusted' ] 
[ BPCOTSCESORS FCA 720d o 309.456.7127). AbicSo rand Reervee rion Services! 


"Current prices' 'Unadjusted' ] 

LRG SZC AOA OS 226 e <7 ake 

"Health Care and Social Assistance' 'Current prices' 'Unadjusted']... 
[YBCO Sa WARINGINY ZL (0K) Sieoyes ILA OOO ONOONONONE 

"Rental, Hiring and Real Estate Services' 'Current prices' 'Unadjusted' ] 
[PBDEO .SIZ BEICR OL 2 FL ONO ONOCONOIO, ee Wakigaer einen, joreees 
mUber-Kemnersne—lomal 

[VeDCO Sis IIA! Ze Ws) Isl) oie 0 CONC io: 

'Transport, Postal and Warehousing' ‘Current prices' 'Unadjusted'] ] 


GESSileshilea 


from sklearn.neighbors import (NeighborhoodComponentsAnalysis, KNeighborsClassifier) 
Ecc Sicieecilsclacaseis sucrose llceycl iis 

recom Slilesici merce Selecesom Ines, iiceulm Cesc Ssjollacc 

from sklearn.pipeline import Pipeline 

Ke VW = LOeiel iccils (seri x ye Tees) 

Eien, 2< TSS, WEEN, Wo esis = riesim toes SoliIciX, Wp Stier), Tost Siaze=U.), remo Scene 
e=42) 

Ines = Nem idloeiclarceelommisyomcinic SOc ysis | elaclem Siete —47 | 

Kin = Miieieidloore Class Lite |i Mmewelilosoics=s ) 

Mice, PIS = Eapsiclliine | || (imee » mee, (lean, imi)! ||) 

INGe) JOIJOS 5 IE UK Tee, Pee SLT) 

[OISLUNE (Met TOI sSCOMS (LS SSE, VY SS) | 


OUTPUT 


Ook 04 ow C4 Gie9 


Unsupervised Nearest Neighbors 


from sklearn.neighbors import NearestNeighbors 

IMSOI1e MMO eis Myo) 

es iercnoec a Pale I Ie“ Ss ae =i. lie en Ini eh) 
IMNors = WSawSSIrIlS Vellogirs 0 Mereiioors=2, eligi icim="joe Ni Jeeoe’ ) othe) 
distances, indices = nbrs.kneighbors (X) 


OMe (IMC LeSs, in a, ca SraMoes, “ma ilies Kae igloos cace ole.) . rosie} ) 
[[@ 1] [[e. 1. ] [{1. 1. 8. 6. @. 6.] 
OUTPUT [1 @] le. 1. ] [1. 1. @ @. @ @.] 
[2 1] [e. 1.41421356] [®. 1. 1. @. 8. B.] 
[3 4] le. 1. ] [8. @. @. 1. 1. @.] 
[4 3] [a. 1. ] Fe: ee ee ee 
[5 4]]  [e. 1.41471356]] ([4- @- @. @. 1. 1.]] 


KD Tree and BallTree Classes 


from sklearn.neighbors import KDTree 

ce Oleic eC MON Or mre omerae 

lec Tete (alii ele, eee ee elec ely | | ope |e es am) 
Kole = INDINGee (XX, Jlseit SiLZe=s0, IME erLe=" Sue lLiLe Sela’ } 

ISCNE ISTIC TES I= TSE Te Cn Stalin S—le Ihe ) 


array([[@, 1], 

OUTPUT "83 
[2, 1], 

[3, 4], 


reed 
[5, 4]]) 


Comparing Nearest Neighbors with and without Neighborhood 
Components Analysis 


import numpy as np 
uijeence imeic Oller Lalo jowellere eis oli 
from matplotlib.colors import ListedColormap 
from sklearn import datasets 
Ricci SILA MOC OIL Selec iimjoeieie wisealia esi SjolleLc 
from sklearn.preprocessing import StandardScaler 
from sklearn.neighbors import (KNeighborsClassifier, 
NeighborhoodComponentsAnalysis) 
from sklearn.pipeline import Pipeline 
i Mmeweiioors = Il 
CeitaSer = Ceresacs. lOee, 1 iei,8 () 
Me We = Ole ele Cle\ics\, Glellee SIS Eel efene 
# we only take two features. We could avoid this ugly 
# Slicing by using a two-dim dataset 
ee aes ee eae 2 
X train, X test, y train, y test = \ 
ieeuld TESS Sloe, Vy SLEEICLIEvSy, LSSe SizeaUs ll. icelmcloiml Stree a } 
h= .01 # step size in the mesh 
# Create color maps 
cmap light = listedColormap( | #EEAAAA’, "#AAPHAA |" #AAAAEE * |) 
cmap bold = ListedColormap(['#FFO000', '#00FFO0O', '#0000FF"]) 


names = ['KNN', ‘NCA, KNN'‘] 
classifiers = [Pipeline([('scaler', StandardScaler()), 
(ior WING nomics Leasis itvere (ie IMSILeINoce Sia ineaLeilaloreces\) | 
I), 
Pipeline([('scaler', StandardScaler()), 


('nca', NeighborhoodComponentsAnalysis()), 
Lisa’, NGuc aloe iC lass iisveie lie Melee s inl inSalGlaloecs\) 


]) 


WL, 3X WED = Xb, Ul) <mimm() = I, Xie, (| alee) ar Jl 
Vo iikin, Wiese = Co, Ii smimmt) = Le, XS, I | atiese()) sp 
MX, We = moc slicgiewel (Io. euceinclS (Ox Tid, OX Mies, In) 5 
iO ecenCis (WL yes Io.) } 

mone iene, Wukis aig, wale Udemies|, elles Sub easies)) 8 

Gilat Se eee ay, Ve Te keelial) 

SCouge <> ile, Siooms (0K eesiey, \/ Teese) 

A = (lit oech er iqgee Iexawegelli, Wyateeniel 1) ly 

Z = 4Z4.reshape (xx.shape) 

jodie. & ih ep bame(() 


(Ole pCO Loic (Gx, Wp Bp CiteyoSemies Niele, eilljoltve=. 3) 


Ue. SCencoe ik Se Ul, Ste Nl, Gayo GieHcmeo loolcl, eclejseolore ii", s=20) 
) LOAM) 
), yy-max()) 
(ibe stele  () (ik = po Olemeic times, ii iertelaleeics)) ) 
fui rsiee xe Oo emO Ie ee ee Orica ki cCOhe) mest 2e— | oy 
ha='center', va='center', transform=plt.gca().transAxes) 


jeune gla oes MeL) F 
Om 


jodie 5 Aba (ye oat alig! 


jolbe 5 Slavin 1) 


OUTPUT 








Nerelcers) Mo) (01a) elelesm @ateksyoyiilersiulela 


Weouer, IMbMiON? Ble iis 

moc, Mercier l.jofoloe 2s jolie 

I Menaie, Skevsllefeueig) tess Sues 

ciel igveveoliciwlbile .celkoies, alicia IakisieSelee loins 
from sklearn import neighbors, datasets 


in WNeaginioeims = ILS 
# import some data to play with 


IIS = CleicelsiSeS LOE! ieaL S|) 

# we only take the first two features. We could avoid this ugly 
# slicing by using a two-dim dataset 

MES Pris dawales, a. | 

y = iris.target 

h = .02 # step size in the mesh 

# Create color maps 

creer Miweiic. = Ibis ceclecloucameyo ( | “orecincic’ , evel’ » «enema WOnneteloIlUiS © || ) 
ile Jolie = || clave onecigleia 7) Vie pS eleualedivioy | 

mene \eake lees! Hae |) Wiguube tai a Melalsieclavercy. || 4 


# we create an instance of Neighbours Classifier and fit the data. 

Clie = MSE lloOOrs «INS INoO eS C Less iit Lee (ig MSI s, Weel Sie ueimicS 

GUlig Sree OG ky) 

# Plot the decision boundary. For that, we will assign a color to each 


ii (SCulens acy ele ules (Pie jane, ee ieee ly uilliahs 0 Jieb< |) < 


Ke Tilide O JMEDS = XS, Wl | aii) = le Xe, OW) nites) sp 

VW TiN, ene = OCS, I) sii) = Le «|e, Ii simec() se Il 

Oe Wi = IS SMS SIGS LC MNO sees (Ox milli, ox We, Iii y 
INDE MeINGS MII, Vy Wier, Im) ) 


A = SII TO eSCULer (Mose. eos ete) » Say ares) I) 
# Put the result anto eae color plot 
Z = 4Z.reshape (xx.shape) 
PlEeMELQure(irgelze—(o7 >))} 
(SILIE SCOMEOUIET IO, , 4p Cie o—oimleyo Jhasejlme ) 
ig Elec ve\ Icyereiqiaa eiaetikientiauen jerenhiques 
SINS 5 SCENES IIO LOT(S 8p Wie VWexle; Llp MMS SLel sy tercger memos |v] p 
jee ceemeye xo lel. ae llelia—i.W, tecliscellonr="lo lerel<." ) 
Ode, sell aia (Sox ym NOx sme <1) 
Oleg Vl (ey Lio (OG Saya 1) ) 
Pl eserele( 3 -OWass elasscurreorVTOnm (Ke —sol, Welt se— ss oo 9) 
6 (| Wergimoers, Werglivcs) } 
(ley Seve k (abiciS SeSeculisS Ioehilos [0] ) 
ulin Lele (ies sikSevelice Inelnes ||| ) 
Sle. Siac) 


OUTPUT 
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Nearest Neighbors regression 


# Generate sample data 

(MOCHA MLE eis ine 

Mies merci Oc lale.joyjolkerc eis jolie 
from sklearn import neighbors 


np.random.seed (0) 

Mo — iOgs ei Sr ~~ taj), iechovcomseegvcl (40 IL) eal 0) 
i jeje lekigvsyercrecn0 S25 2S0K Lt. sale cinieniyep aks | 

y = np.sin(X).ravel () 

# Add noise to targets 

WS] ae I ee CS eon, te eharolonils tetelionel (iS) 


# Fit regression model 


ik MEICINOOrS = 

inOne pilpe euke lates: i ilial weveibilevachee (|| VUiguitmonai ys \ielisiuchqre=s. ||} 
Kini = Meelis O~s WINS Weg iMoGr SI SCimossOrr (i MSLEiilsors, weLciIMNcSs=weS Leics } 
Wo = Iie Ok, Vy) cioseteclue@re (Ir) 


eve gisibleye: mene (2). lk 1s ae 

plt.scatter(X, y, color="darkorange', label='data') 

Puen OlLOe hy, Vo, Golorm="Rewy 7 lbeloel=orecue ction!) 

jouls srebaksw ieallofaver | 

plt.legend () 

uke Cae Le (NS welloCrSINeciwessor (ik = ol, Neolgites = Vas) a Wn Metellooms; imSielmes) ) 


jellies ieieiae leew: |) 
SIE. Slaven |) 


OUTPUT 


prediction 
data 
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PAYTHON ole) bi] = 







import numpy as np 

com Skew mec Sellsoracia Lipo: eis cose syle 
from sklearn import datasets 

from sklearn import svm 


Ky WY = Clitasece. Lose! Lele isi os Velie) 
orint (X. shape) 
print (y.shape) 


M Cee, OX SSI, WP me Mi, W Ces = tee west Solve, Wy west Sive=0.4, memclom Srenesl) 


print (X train.shape) 
print (y train.shape) 


print (X test.shape) 
CIEMME IY SSIES IaS), 


Clic = SW SWC leimell=" iince’ , Call) Me Cia, Wy tice mil) 


Puen (Cle BOWS Ok WEEE,  ILSSIE) | 





NATSU yl ate ey delstssaVc-l (ers ie) alm el>)archVile) aula mesve| (ita ct- Ida MmvarsiercliP4omel6| mercitel 


(1) 


EicOm SC cilocuin mec | SSheeracm aimeoee WuimlesseirLoss Olan, Imei. iSilnwaz i lesyeul ac, 
StratifiedKFold, GroupShuffleSplit, 
Ene ele h Olle. “Seicelie hie al volSiquvlie 1s ers oh iLanie, 

MONO, NVM? eis ine) 

INO Me eioukene alex joy jolene wets; jolbic 

riaeml mene lor lilo jeeiceliSss injec, Ieicels 

ehowmar-beloleviwasi-l-10m @lmGIoLoD) 

Swe Cate = Clic. ci, Parrec 

elileyor Cr = jolie = Clits CO ILisel qin 

Oo Solis = 4 


# Generate the class/group data 
ih (OOS: = LONG 
= ime pies avelom, eclevenel( IOI. 0) 


iemecimeilos Clessos = [all, sa, <6. 
NP == iO iavsneevel<tC || alah) 2 alge ION =s Vorevne,, 


fOr Il, GSO 1 SiuMmeice tS oeroentiles classes) |) 


# Evenly spaced groups repeated once 
eieOuos: — ier lisicevele( I tak) IC mere ak) akin. ieehavefe( IL) I) ) 


cSt WiSsvelize Cimouos classics, ciceules, ies)! © 

# Visualize dataset groups 

ike, ~ebe = jolie, sbloje Lows ©) 

ax.scatter(range(len(groups)), Leo) ~~ WhSigiie cee) eS cigebis,, uileneldsie—  Y y 
IVER, Cilayo—CilesS Cleve) 

ax.scatter(range(len(groups)), [Sad] * Ieinicucoulss) , C=classes, werekere*  ° 
ISS, uneyoSemeS clever, 

axons bee ella le Oey CCS =i 7m, Olly 

yticklabels=['Data\ngroup', 'Data\nclass'], xlabel="Sample index") 


WLEUeULI aS icowlos |i, eieoUjos, Vine eiccules ! ) 


OUTPUT 





Sample index 


Visualizing cross-validation behavior in scikit-learn: Visualize our data 


(2) 


econ SlehkSeveiiaiMleciei soir aboie: alinewes (iikmlescrenossielliley, humo el,  Ssicues Ibe sje il aicy, 
Seige Bilis eevee) bel /Cige byes aybimis Ieee jelbalie 
ECixopelwmeilel, Sieia@cies seakeSiovinicde Solace) 

iheenes, igibhewse eS iele 

Meee e miel eyo ibeve alle je joulkens, cise joie 

iia, Whee codlencihiilow ecinclass, slqeesein, leeiecls 

np.random.seed (1338) 

emeyo Cleves) = jolla sce elt misc 

eke (Oy = elle Hels Coolie rai 

igh seo babe se 


clear jolene: vey aintleli@ieis lew, 25), ry epecia,, lc, soll mess, Mii) 

UMiedGimets Se mci reve ee. jelkene -imeng Gil @elne@iec 16 )ne. senegal susie. echlLike oes e@ igi 1e) Oise. vet 
# Generate the training/testing visualizations for each CV split 
ise) alike (Abigp ie) akiaw (aiqublakena clue) (ewe sie lhe (0a Sy eae) Bers! epme use) .)) = 


t Pill an indices with the training/test groups 


indices = np.array([np.nan] * len(X) ) 
indices [tt] = 1 
indices tr] = 0 


# Visualize the results 
axX.scatter (range (len (indices) );, [aa + 2.5] * Len (tndices), 
SSiNeniess, ileveltse=! 4 iii, tenhejo—elels (evr, 
vmin=-.2, vmax=1.2) 
# Plot the data classes and groups at the end 
ax Seat ter (range (len) ) 0) (sie eS | hen Os 
Cay, Whercldsic=" 9, lili, Cileio=Cimels! Cleve), 


ax.scatter (range (len(X)), [11 + 2.5] * len (Xx), 
Sie, Wlerelete= py Lily teiejo—eimleo elence\), 
‘an Rola ttshonenlexe| 
Wietle le Leloeilis. = Jlisiel seine (ia) siolmiss))) se [eles ', “creoulo' | 
ee, Siete (WieveleS ie, cuceiaciS (lo Sollavess 2) =p «9, Wedelkleloe le=yicwelkileioe ls 
xlabel='Sample index', ylabel="CV iteration", 
\VdLabin= |i) Si bGeS 22 ey Se 2), sella | 0, G0) |) 
eK iee lomindie() 4) oe Oieimene (EOS ev) = imei yy, BE Oie Ss Lae LS) 


return ax 


IfILGy, el = jolhesslleederns:(() 
oy = lisolicl (ia ssifollac Ss) 
IDOE jy siishi@as ov, —S, So, CreQuios;, ee, a Solves) 


OUTPUT 


<matplotlib.axes. subplots.Axessubplot at @x773659745698> 
KFold 





7 T 
40 60 
Sample index 


Visualizing cross-validation behavior in scikit-learn: Visualize our data 


(S) 


ineoml Shelled —nOclol Sokeee mom mmjoenae (Minne ere Ws Selle, iit, clmuer Ik Syell ae y 
SC eel eal ie Wevel ae lkel= rae uieySeulieie IbeySjoliihic,- 
Esco oldedlicl] Sreieieal alee Sle Use iesjollalie) 

Thien, ie binleye eisr Inve: 

mies vec Olio jox7Oluec asi jollec 

Icom Melee Loc Las soca meow: kelecla 

np.random.seed (1338) 

emicio Cleveey = jolie sis Pelco! 

Gime Cy = (Olle sii, Coe lieent 


He Sle S = 4 


Ghent, jelicue, (ey slices (ew Ge 7 epee, erg, io Sicily. iyi): 
© Creavela Sample plow for andices On al Cross-validauton objyece. 
# Generate the training/testing visualizations for each CV split 
for ii, (tr, tt) in enumerate (cv.split (X=X, y=y, groups=group) ): 
# Fill in indices with the training/test groups 


indices = np.array([np.nan] * len(X) ) 
indices[tt] = l 
indices[tr] = 0 


# Visualize the results 
ax. SCabkuer (range (lem(andices) ), [ar + 5) * Len (indices), 
C=Liichees,; ineielcie =" 17 Ilya, CilejoSenelo Cw 
WAM A “aie 1 
# Plot the data classes and groups at the end 
ax.SCauter (range (len(x)), [aa + 1.5] * Len(x), 
CS, ene<cig=) 27 Ive, Ciigjo=—emelS Cleicel; 


ax.SGabuer (range (len (x) ))) [aa + 2-5] * fen(x), 
CSCiowlo, lileveler— = yp Ili Cilio senilelo: cletcel) 
# Formatting 
Wieellalxs Le = Ibis cemee te Soles) ae [Yelessa', “Ciena. | 
ak, SEE (ite Keo). eicsincie (a Soles 2) a 6S, Weleixilecelesyeie cleo le, 
xlabel="Sample index', ylabel="CV iteration", 
a= id eer Ae ta Allg 0, ONT) 


ecg See (Oeil (| iY atcoiamee (eyes (Cw) = eles i, KOmEeswoall Ss) 
<MatpLlLoclib.axes. SUDPLOTS.AXESSUDDLOT atl Ox773659697598> 


StratiiiedKFold 


return ax 





ieG eee = jolie cswloolloics |) 
Sy > Sicieee sae Mech Iaolei(il, iol cs) 
ILO yr WigeltOes (evry <p We Sreoulos, ex, i solics) 


OUTPUT 


40 All 
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from sklearn.decomposition import PCA 


LOM Mok COOLS «mo leescl wile EocSs.o 
Dior e wmuliMIOy SS Wye 

IMO Meno lois Orolo eye jolie 

Ou SOsON? AO me siseles 


# OTHE EEE HEE HEE HEE EE EE EE EE EE HE EH EE HE EH EEE EE OEE EE EE EEE EEE EEE EE EE EO 
# Create the data 

e = np.exp(1) 

np.random. seed (4) 


def pdf (x): 
iecrethen Was “= (siecics ine | Geclle sl L257 1S)) Biota <) 
+ stats.norm(scale=4 / e).pdf (x) ) 


y = np.random.normal (scale=0.5, size=(30000) ) 
xX = np.random.normal (scale=0.5, size=(30000) ) 
Z = np.random.normal (scale=0.1, size=len(x) ) 


elias = ‘Sole io) 9 jercle (ye) 
\OChe = JOChie |S S 7.) 
OlSieS in “= joelie 


a= ex 4 
b=2* y 
oS 8 = lh a 


noua = inj, Senet (elaweliet) ap los wee) ) 
a /= norm 


I f= inven 





a ee 


# Plot the figures 
Ser jlow tigs (ire mim, Slew, ein) < 


nA, = JOINE sia fefoeS (eae) IMUM, IR eps we |e 
jolene (7) 

ax = Axes3D(fig, rect=[0, 0, .95, 1] 
Zn ers bie Sie dst | knell ee ley) 16) pa en) eer] 
V@—SnO. Cala, mw, eC 


f USIMGs SeCiPy Ss SVD thi se would be: 


it pecan SCOle me Vl sscelpyslinalgnsva(y,, 
Ge = PORWR CoOmoonemce= | 

[Sela Ese (ie) 

WV = [See s COMmjoOmeimes a Ih 


K PCE ERIS, VY Ce GxIS, 2 JCA ees = 


<< [Neel eo lveuns 


iNOgie ld jee arcs || 32 |, 


Vy joCe lemme = most ly ice axis | 22, 


2 joCGe jolemSs = moo |4 pea 2518/32 | 5 
(Ce) jolene .siteieS = (A, A) 

vo OCa jlleine. Sineoa = (2, 2) 

4, (Ne OleaNSs Sieg = (2, 4) 

er KGS NOE, TSUNA iRelCS O< joe jolene ve jeleel_ je 


eyo OSES SS weil lhalos is (|| ||) 


Ercan VWerriGesee iilelilalcsie (|| ||) 


ec aw 4ercGe Sere ie iel<iheles ie Ill) 


elev = -40 
azim = -—80 
(Lor Mee (i, Siew, erin) 
elev = 30 
azim = 20 
LOE EMS ( 4, Slltev, eZ im) 


plt.show () 


OUTPUT 


2) ) 


{/ 


, elev=elev, azim=azim) 


c=density[::10], marker='+', alpha=.4) 


T 


ed tithe crete jie less) 


Bp AVE 
— )< ole) alrcakis) [Le ei) 
= hea eke Le Baal | 1 
Za C amass les te ala 


iRchelow 


M4, jOlere. | Oi ILieligue: } 





Talelasjaatsialtclin ma@7A\ 


Tateig=yaal=)airclin ©)al aes ey=\mere)an) ole)al=)almr-lal-thVAs1sim@1 ea @7A0 IN om AV, 0)(6r-l | \VMUEsico mr-kowr- a=) 0)(-\ex>) pai) al mie) mm ©) al ales] ey-Vmexe) an] ele)al=yart 
analysis (PCA) when the dataset to be decomposed is too large to fit in memory. IPCA builds a low-rank 

FeV) e)gey.dinar-iile)ammie)misal=m lal elelmer-lr-mOl-)iale m-lam-laalelulalme)imaal=yanle)avany'al(e/aMtcmiale() e\-1ale(-1a] me) iaiomalelan|el>)ae)mialelelmer-ire! 
F210] 0) (cto Mm ESMmc) I} @ (=) 0l=)0(@(=10] mela mial= Mal elUimel-lr-Mmict-1l0|¢-1sem elUlmerar-lale||alemiatcm ey-1keamsiy4-m-N1(0))\ commie) mrere) alice) me)mani>)aale)ay 
usage. 


AMOeies igush@hey, ~zis isle 
iHome wereellio Live jojoliog es jolie 


InTOMN gsi ered ClecclsSes. areca, dle@eicl jie 
from sklearn.decomposition import PCA, IncrementalPCA 


fede = AOC Ties |) 

X = iris.data 

y = iris.target 

i COMSotemcSs = 2 

WjoCe = lineroneinee oC’ (in, Conmjoomemes=i1 Cemjoomenics, ISerccll jsi.4o= 0) 
eee! = alse Ihe Te eeuols 1h Oven (4) 

[ere SIGE (ink ICOMMOON MNS ius, (oumloe aegis) 

KX JOCE = (OCe 111 Tere S ora (Cx) 

colors = ['navy', 'turquoise', 'darkorange'] 


ONG 2 (erecliaisuenentecl, ities akoy [| O< meee, yliiemeonicmcenl TOE (OS jee, MECN) || & 
plt.figure (figsize=(8, 8)) 
ioe jollond, the eels lems in, “ijolioelloms, (O, ky, Alp tems > censors oles) = 
PE SEE ESC (OS Tees mOrrmeclly == te, Ol,» — Ememetommecd yy == a, Li], 
Solloc=colem, Iwa2, leosl=carwcor mens | 


Hien a dhigiee—iteveiech ss aba eakie lk: 
ei =] Mo sels (Me, aoS (OC Ce) = Mos elos ik WsIGe) | oneein |) 
plt.title(title + " of iris dataset\nMean absolute unsigned error " 
Lopate eae) 
else: 
Dilip tative (eat len Oe Tse dak acer.) 
plt.legend(loc="best", shadow=False, scatterpoints=1) 
ll pera Sa A ee ee Oy eee 


joule . Slacw |) 


OUTPUT 
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PCA example with Iris Data-set 


IINSCnee MbNoN, eis, Inyo) 
iqqeenaic “encjolene Ihatlonjewjolion, ens jolie 
1Eiaelim WMO Teo. Lieivicts suloseie sick Wilseree kodSs sip) 


from sklearn import decomposition 
from sklearn import datasets 


iyo _iecliacloils, SSScl (S ) 


eSigcecies=— lik, Jol [a =i Poe] 2) 
Wiese = eleicelsisie es IiGeel aiciS |) 

= Lies yclelice 

y = iris.target 


fig = plt.figure(l, figsize=(4, 3)) 
joule aie sila () 
ax = Axes3D(fig, rect=[0, O, .95, 1], elev=48, azim=134) 


joe eee) 
Ce, = ChSCINSOS IE NEM ICE in Comooimenirs— 5) 
[Orel Teale 10) 
i = (OCS a Gis eiia Sir ici |S) 
Bere meyenie MeeNove ih caliel (Lie sree cise Oe CV Veerecaleellio tia IL ee Me Grigejsbolerel: 6 24) el 
ax.text3D(X[y == label, O].mean(), 
X[y == label, 1].mean() + 1.5, 
X[y == label, 2].mean(), name, 


horizontalalignment='center', 
bbox=dict (alpha=.5, edgecolor='w', facecolor='w'")) 
# Reorder the labels to have colors matching the cluster results 
= me emocse (yy, [ly 2 0) saisicyjee (lose) 
ep<psicecee (Sle, Ul, Sle, Lie Sle, 4ly Gay, eilejo—ole acing m1poN SSee cei ily, 
edgecolor='kK'") 


esol -<ebeLS oStt Jee l<ileissis (| Il) 
eps Webel Aciere Te tie levers ike | ]|) 


AoW “eres sian eiclibevos lls ( | I) 


(lies Slaven () 


OU'T PUT 


Virginice 4 


Versscoloul 





Pipelining: chaining a PCA and a logistic regression 
The PCA does an unsupervised dimensionality reduction, while the logistic regression does the prediction. 
We use a GridSearchCV to set the dimensionality of the PCA 


import numpy as np 

Meee micelle jeje es elle 

import pandas as pd 

from sklearn import datasets 

from sklearn.decomposition import PCA 

ciecml Slleeiced. lieve imellel amiteeicic lke Sic eee Seis O10 
from sklearn.pipeline import Pipeline 

from sklearn.model selection import GridSearchCV 


# Define a pipeline to search for the best combination of PCA truncation 
# and classifier regularization. 

pca = PCA() 

# set the tolerance to a large value to make the example faster 

LOGISIELE = LOCUSTLENSC RSS SION (mle Teer ION, BOUL) 

pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic) }) 


A CibgmieS, VW CILQhes = CleicelSecs:, Ilosiol Cligaics Me cued x vellietle) 


if HelgcWSIelss, uk Suloislildies, eid Ie sisi, sui) ~~ GSleleliseicSe, joelesliteieie Mielec: 
param grid = { 
‘pea N components =; 919, lo; 30, 45," 64), 
hoe were 10 2 imo logsicles (te 4 a) 
Seared = GricsSsaranCy P1eS_, Pee Ci1Cl, mM elos=—l) 
SSeS. RM CGS, 3 Cages) 


OIG MMe (bse Celcelistion (OU SCOKS= Mor) a te Seeicchiiesc sic | 


SIME | Soercel Joeie, eetceliis 


# Plot the PCA spectrum 
(Ge) ITE (Ox CliLC eS) 


fig, (ax0O, axl) = plt.subplots (nrows=2, sharex=True, figsize=(6, 6)) 
eS OLOe lOcciecineiS (i, iCesil Cousemoii es =r IL), 

PCa nexo lee VArIAMeS Maco , 81), Limenichelt=2 } 
evs seieic wilelorell (Vier. Segcileawimacl VveielemcS iceeie ! | 


xO ,exvlime (SSercl best Se cine cor -memisc Sues | ce” |. CCiSoOneiicsS, 
limes yles' se), lalosl="n Cenmoomencs cmos ein” | 


ax0.legend (prop=dict (size=12) ) 


# For each number of components, find the best classifier results 
ISS lies = joel, DWecels celine | SecicCli ney eesuiics! | 
SompeimemMes ieell = jee eee ol eisioomanies ¥ 
lee Clie = wSsullies seicevisioy cei oomisincs Col oelooly| 
Ikeviloreley Teyh Cha ubenecios sc (I ~~ iilselel wistsic, Sieieucie! | | 


les Clhics oO E DO SeCmponiSstes Coll, We oileel SSS Seo ge, Visise=Y siuel (eosc SeorS | 
legend=False, ax=axl1) 

aileseic yileloeul (Clear rience wom siete y ell) 4 | 

er clerk Ale lore ley sick eronierigleiones: . ) 


joke wala (alk, IA0h, 


(Le aeSine Jkenyou1e( ) 
[Slee Slain) 


OU'T PUL mn components chosen 








Plotting Cross-Validated Predictions 


from sklearn import datasets 

eco SickSereni shock l, Beer LOK IWMNSCIce CIMOSs, welll, joieSelliee 
eo Skier mureeigis mjmeeis ioe ih 

MTOM wdecoloOcllileseyjolkere eis jolie 


ie = ULTig See TINO Sls Ib ighSeielNSreie Sj ers TONG) (| ) 
Ke WV = Clecasars. lose Clacsres (icsturia x V=lWeine) 


ip lencl@crs, welll scuctetchwete Gesu bycimns qld Guegely (Ole jegle ‘siclil=e WeuAS els. ye  gigveucien ockeln eu tieiey 
Fels a.pGecmetlon Obtained by Cross validation: 
IcechLeTcec = Crosse Wal jomecdLer( im, &, Ve CwHl0) 


INC. ee = Vollic AS Wlciolkoins 

ax.scatter(y, predicted, edgecolors=(0, O, O)) 
ee cro voret EyasiakenGl ys, Ssiileb< (0) ye Iyevmimin Gas venetian Mea 
axe Sebel ae mG Measired.)) 

axaSen, Vileloel ( ieieSe Cesc”) 


lhc 5 Sino 


OUTPUT 
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Nrolatsllarctcl@re nal, 


Perform binary classification using non-linear SVC with RBF kernel. The target to predict is a XOR of the inputs. 
The color map illustrates the decision function learned by the SVC. 


MOSSE, IMO cis ial 
WMO, “Menciollic io jovolec <2e jollic 
PrOlmomlCaatm oC Ess jm 


ei o00)) 


XX, yy = np.meshgrid(np.linspace ( j 
, 3, 200)) 


=e. 
hip .lamnspace {—3 
np.random. seed (0) 

X = np.random.randn(300, 2) 


Mo ie. boouiee, somes Le, @) & Us «la, Li 2S @) 


ie Aa, 1eln=s anorereul 
clf = svm.NuSVC (gamma='auto') 
Olies eabeea NC) 


# plot the decision function for each datapoint on the grid 
hy = lie Selene Ord IU SOMO. [xox Teena), wapsienrell ()) 1) 
Z = Z.reshape (xx.shape) 


plt.imshow(Z, interpolation='nearest', 
Suse Cocumuba )y Socket) Sev aubia sAveiieh< (Os sesiccer— ese | 
Ore i=" Lowieie’ , (Cilio ole Cun leu 1) 

contours = plt.contour(xx, yy, Z, levels=[0], linewidths=2, 

linestyles='dashed') 
(ere wes Ola ily Sok Ih) ss 0 eat fenteye ollie ea iecihiasvoly 
edgecolors='k') 

ea cakes) 

[Oui Vesela in) 

ieee <SK = 57m pee 51 eo lh) 

Siskonae) 
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OMVAN/MViFeDAlAa1O anim ners lee llamsis) ey-leclilalemantd elo1ae)t-lals, 


Plot the maximum margin separating hyperplane within a two-class separable dataset using a Support Vector 
Machine classifier with linear kernel. 


IMSONAE, MMO sys! Inyo 
Ieee. Weve olor bilo jowelkee eis jedbe 
ico, Slee alice@nce eyyai) 


Reem Ss MSc aCleiesers Miles Wmel<s Jo lols 


# we create 40 separable points 

Ke WV = Weise llkolos in Seiloles=i0, COmeenS=2, <celaclom. Steco=— 3s) 
# fit the model, don't regularize for illustration purposes 
clf = svm.SVC(kernel='linear', C=1000) 

@ ib ara © (Gy) 


(eer ececeaO<a Ul Cla is tei) si “alice ele wei eenlieec) 

# plot the decision function 

ax = plt.gca() 

maim = ebxo Gis os ibaa ((), 

Vila = bx. cee yam) 

# create grid to evaluate model 

hoe Kies Larigisiocvoer( cham | Cl), cian [| Sieh) 

vA = ets Ikaisiecree cham CNS alana Ll “Sel, 

Nes OS = ihoymletsgenescl (War. Hex) 

Xy = np.vstack([XX.ravel(), YY.ravel()]).T 

i = Clie pCloCe MOM UICC LOM (57) smesiteloeS (OX ss laeies 

# plot decision boundary and margins 

ep <A COM co Mia (VOC SiGe eae elon lei Bees | le Ae I a eilbiole si 0) or 
linesty Ress = =a |) 

# plot support vectors 

aio Geter (Clit gsliqgeee Wecrors Lk, Ul, Ohrssulojocee yoceors [ee Il, s=00, 
linewidth=1, facecolors='none', edgecolors='k') 

Obie. Slaw |) 
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SVM with custom kernel 


Simple usage of Support Vector Machines to classify a sample. It will plot the decision surface and the support 
W{sTeite) és 


aioe iee = ielbuen, vels  inhe) 
IiMoonee. vere walle. enqoliece eS jodie 
from sklearn import svm, datasets 
# import some data to play with 
Ligiks =" CelleSsees 4 Oe) iio ( ) 
MS seis. Clatear ls, 2 Z| # we only take the first two features. We could 
fF PavOovade tik sSmuC ly  olreang iy nus unig asewo-Gumercatacet 


Y = iris.target 


hol i ba C | a5 | 


det wny a kerne I(x) sve: 


voy vy 


We create a custom kernel: 


(eo 
is ( G7 ee — | 6 oak 

CC) 
= Operas 2 ele POp. tes Ol) I) 
ISG wisn) IYOwrOlene tiger eke (86h NU) Ses Ml) 











h = .02 # step size in the mesh 





# we create an instance of SVM and fit out data. 

Gk e— es Wis Je (keene mye kernelk) 

ele tein. (OG ree) 

ee vee. lake chkS eal stieial Joeibisicieieys.) Iseie eleichey) We nab WileeyeSleial ec lee ioe ice se clea 


ie (Senbious aligh “cls iilocln. |e alin suey <r se iilatay. 7 ike os || & 


es linigys Se aie, I OSes Ol ie es re 6 ae Oil) ae il 

Ne Mee, Se ive = OX ep IL || sabi) = he 2X ep db emier<() se cL 

MOR Wy = ge suNeSrsiaepie tC) iS selicewaleis (>< MMiligh, >< uveb<,, lo) InoaeviesinvelS (\7 ulin, Ye Tileb< 7 |g) ) 
Uy = (GALE 6 [gS INE (Sats oka evel, sarees viel) ||) 


ft Pub the result antova color plot 

Z= 4Z4.reshape (xx.shape) 

[Obie ieee enenisienecs warn 847 seme o—e)ilis Kein 2 clibiase)) 

+t Plow also Ene training pointes 

Dea. Ssecabeer (<i: 57 Ol,  xl:3) 2, ¢€-—Y, cmap—plt cm. Parred,  edgqecolors—*k") 

[oubices enn cent. Sr Clkchsis) Se lkeKs Sihimalerche heal iosulgglep robleeusie Weeeele Aero aabiaver ailinabaioibk=nc eli 
" kernel') 

[Qube areb ils AC Sie ue lane, + 4) 

(Oe sier Ss 1 Oia) 
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Plot the support vectors in LinearSVC 


Unlike SVC (based on LIBSVM), LinearSVC (based on LIBLINEAR) does not provide the support vectors. This 
example demonstrates how to obtain the support vectors in LinearSVC. 


import numpy as np 
Meee, wee lliore kWo eyo, eis jolie 
Reo Sikes minceteSone amore tiles Is iholee 


from sklearn.svm import LinearSVC 


Kp WV = Mele Iolols ii SeimlShos=A0, CSnmirers=4, WwelhCCm Sceico=l!) 
plt.figure(figsize=(10, 5)) 
fOr 1, Cin enumerace (|i LOO): 
# "hinge" is the standard SVM loss 
Git = Ibnesarsve (C=C, loss="ningie", Remco Seers—22) meh, sy) 
# obtain the support vectors through the decision function 
SSCIS1LOM MICE LOn = Cllr Ciel Siem i Mine cum (24) 
# we can also calculate the decision function manually 
ie Wletenealoi) MUNCIE Liem = ia ger Uh. mile seo (0) sr tells walonsiciceejere. 10) 
# The support vectors are the samples that lie within the margin 
# boundaries, whose size is conventionally constrained to l 
SUOGICE. WESC TNC LeCSs = me. WinSice | 
MISeNOS (OSS LSM MIMS eI) <= I ae IWS= iS); | 0!) 
SUPINE WeCtOrs = xX SUISSOrr Weelor Iscices, 


Pikes Wop lo cl yee a) 
jolie a sierneecuaO<i 2) Wily oles Ie eas Sell: teulee ollie Vel acl eclel), 
eo = illic sores, 
xluin = eos Cee Sm) 
vila = ax.Gee lla) 
OG NA > Alen MSs nepe atic! (ayo a habiaisyere Clerc lala). Sellars Sh): 
ines Mase vali Ol sland |i), ac) 
My = A1GlLik JClIGIS IOI IRUIMNCIE WO Mone | Oxa melas Wy ateenS i () |) 
Z = Z.reshape (xx. shape) 
Dit. COMEOULYXxX 7 Vij, 4, COlLOcS—= Kkiywlevels— |i 0 a hie alplia—Um 5, 
anes bes ee |) 
ME SSCS (SUIS VWOeCTOrS |e, Ul, SulgSemc VveeEoes (8, Lil, Ss=lul, 
linewidth=1, facecolors='none', edgecolors='k') 
jee icmie Ie sO ee sciatic) 
folie eal: enone 
plt.show () 
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support Vector Regression (SVR) using linear and nor-linear kernels 


1D regression using linear, polynomial and RBF kernels. 


import numpy as np 
from sklearn.svm import SVR 


IMMoeiee Weicje lee lilo, exyelloe es) jsulic 


X = np.sort(5S * np.random.rand(40, 1), axis=0) 


np.sin(X).ravel () 


wlseo)] tS 3 © Uo = IO. Ice aClOlll, ela! (13 ))) 


swie Tom = Svs | ereme I= , C=100, gamma=0.1, epsilon=.1) 
eye Jia) = Sys | events I= Fad Oral OL OMENO THIN Noe ) 
Syne IOC = SiR Mev eigie i , C=100, gamma= , aegree=3, epsilon=.1, 


coef0=1) 


lig = 2 


Syies = | Swi mor, Gwie lit, Swe jolly, 
eveivelk liveliest, = || ; : ; 
Mechel Colic = ml. - P ] 

fig, axes = plt.subplots(nrows=1, 


for 1X, Svr in enumerate(svrs): 


ncols=3, figsize=(15, 10), sharey=True) 


ayes (1.5<|| JOO E(k, Sie IRIEOX, WW) oOiescler (x), Collor=mecel Color |<), lien, 


label= 


SIRO IS MENE (eeinS ih lees h || lox |) | 
aba [I<] sSCeveter (UC Swiessumoome |, 


WL SwesSucoOM@c ly tele Colkor= P 


Scloiseolomamoce! Color |x|, SSS, 


label= 


Orme (iesiemel, Ieloel! | 15) ) | 


adas | 15% || -SCerecer (Ox | ios Seecciirr le (io .ereiage (enV). Sye-osuicoorec ) ||, 


V lini SereokLiri Ich (io cemce nee | hem (OC), Swie,eilcooire | ||, 


facecolor= 
label= 
axes [ix] .legend(loc= 


, edgecolor= , Ss=o0, 
) 
7 [910Cr. EO eitklicre= (Ula 55 Iya Ik) 2 


ncol=1, fancybox=True, shadow=True) 


IG pemuia>. Ce. Olen e nen ka 0)aF , ha= 
imiseps wed, (WOLaICKe.. 10le Sy, , ha= 
ime SO wl eer 

show () 
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- RBF model 
REF support vectors 
© other training data 





7 VWel= 1° ) 
, Va= Pa ao} er-henke}el ) 


, Lontsize=14) 


Linear model 
Linéar support vectors 
© other training data 





— Polynomial model 
© Polynomial support vectors 
© other training data 


MVE \V/ Babel >) OF> 1 e-Nilale mah, Ol>100)t-lal-mle) mela] ey-it-1alerslem errs hstsior> 


ma alo Mm tai> me) ©) Nl aat>lmsy~) er-le- hil are mah’, el-10©)r-lal> mel) ale mr- 1a teh’ A Ome) mieit-l-3-\-1-mtal-l mr le- mel aler-it-lalercier 


VAVC=Ma iL g=) an ale i dal>mesi~) ey-1e- hil arem e)r-lal>mi item era me 4 Om-lalemial-lam e)(e)m (el-t-al-1ep mlal-m-\~) ey-1e-li are man’s el>19e)(-lal-muuine 
FelUi Ke) patouiercl | haere) aa >Xe1ilo)amm ce) ane lal er-it-laleisle mel [-tsts\o1oe 


i Ce fe ao. ers To ' non weighted 
isiicloucic: eirenejoilhiere lle giewsellieie sis jelle ~ 2 ° ¢g weighted 
micOmers he atiie tmp Ome s vim 


ifn Sc leeueii acereesSoeasS aise wile lsikoles 


i en at na LO On Oh hel hal On mn ar-helelo)1 mn Clon manors 
ig Seugioites I = LOO? 
i Seiloles 2 =] LOO 


Cencers, — i bO 2.0, 020/15 12207) 2. on 
ellbisiceies, sie Leo, Oss] 
i = ides Joilolos (in Sscimcolles— Lia fscilcikes, Ly ta senile || 7 


centers=centers, 
Cision Swe elise mses eer, 





leeuahereni Siecioa— Oy, silauee i Ike Side ils Ss) 


# fit the model and get the separating hyperplane 
clf = svm.SVC (kernel='linear', C=1.0) 
GalMireaenrinien CX ame) 
# fit the model and get the separating hyperplane using weighted classes 
wedge == S\amle SVC Uke ie ik= ihigieeue’ » Wllevss: enveloe= | ie LO) 
iW Cul nin irae ea mel: 
# plot the samples 
ple. scatter(x(:, O1l,0 <2, 21, c=y,7 ecmap—plt cm. Paired, edgecolors—' kk") 
+ DOE Ene GCeciSsi1on fLunetions for both classifiers 
abs — jodie eres) 
pe Lalit, == “een (oie >< Laban), 
ye lahitt == eC sz il ain ( ) 
# Create grid to evaluate model 
eo — NO dams Pere eo cel ara Ouest ee ©) 
Way, == igen Jintievsiocvos (volo) ileal iy SO) 
Do T6 OOK He alow mikereleleienke (Vaya. pos) 
we eNO, WAS Bret) <a DO Keech pe Noe eel ce bn jn io) 6 ab 
get the separating hyperplane 
= Cle cloes SOO se oOd (OS) » ESS loeyoS OX 4 Sleveyor ) 
(odes elsieakcabeoig love cigvelelians elaicls iene pliaks 
= ax.cOntour (XxX, YY, 4, Colors='"k*", levels—[O], alpha—0.5, linestyles—['*=']) 


+ 
ZL 
+ 
a 
# get the separating hyperplane for weighted classes 
Hh a WON GROOT SLO) Ae UNS we ON (ON) Se rs INOS (OO8 s Snel oe ) 

# plot decision boundary and margins for weighted classes 

Ib = ax.~contour (xXx, YY, 4, colors=—"r", levels=[0], alpha—-0.5, lLinestyles—=[*—"]) 
ple vlegend (fa. cottectrtons [O], breclitlections [Ol], 1" "nem werghted"™, "weighted |], 

ere )abis)si-va iaike lane.) 


Omer Ge we) 
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import pandas as pd 

Toone IMO? Vers. Igo 

import seaborn as sns #visualisation 

TNO we, were lene Iie jeyjolon es (ollie jj iale ied mee eae 
IMVOCIG Beeloeicn eis Sic 

Leow Seen moc | SSheCr hon Nmoome, wireia eee Sie lie 
from sklearn.preprocessing import StandardScaler 
from sklearn.ensemble import RandomForestRegressor 


fe aa DowmlOACrt il eS@ be. Co Vea 6 ee #t 
tent Les:) / drive goodlencom, file, dy ImVvmENx6cbrvVRHCyDVE IAAL JweEloHbory/ Vrew 
corn Sn ee ce ras ay ay a cael os ecles pelsaelcats Sa oe) a ea aheew aw es it 


# The upload file from your system to use this mathod. 
# It is posible for Colaboratory Online. 

tromejoogqle colab = iipevisnimles 

uploaded = files.upload() 


LUNI S SLO 
Clee SSC = joele nSecl CE (LO Seve WO (uo loaolacl | SS rel, Cometic Oi a Cer: |) 
# Here Soper.csv your file name. 


# dataset, — pd. read) Csv(*D: \ Datasets (petrol, Consumption. csv ) 
# This for System Software like to be Pycharm. 
dataset.head() 

















[-y Want lolnt 









IMpOLrE pandas as od 

LUVIN. UNO, eis) Taye 

import seaborn as sns #visualisation 

Hueioneia wieneolere Wile yienjolioie cis jolene val Sule Leics ikea 
HMpoOrkysecaborm as sits 

Eom Sc heewasmocel Sseleceiom iimjoor. wescla eee Seliie 
from sklearn.preprocessing import StandardScaler 


from sklearn.ensemble import RandomForestRegressor 


ie amc ee Browrigiibetenc! MamblbS)ee ade ye it 
# https://drive.google.com/file/d/1lmVmGNx6cbfvRHC DvF12ZL3wGLSHD9f /view 
ee ccc ts a a ee en it 


# The upload file from your system to use this mathod. 

# It is posible for Colaboratory Online. 

from google.colab import files 

uploaded = files.upload () 

AMOI, AO 

CVE = PCleieeol Cs (OnE yess lO (Ujollosclecl | joereol Consilience’ |) ) 
# Here Soper.csv your file name. 

a ee ee ee # 


ie Wieleelein, (ele ieee, eye IW Deitesecte  VoSiemoul Joe Smhil eaicl aie) 


He 


This for System Software like to be Pycharm. 


Pigsjochan Me (blevce! Moye Mice lability 
= dataset.iloc[:, 0:4].values 


KS Fe 


dataset.1loc(|:, 4] .values 

TSE, OC EES Vy elec, (7 WSs = Tren Tes Soll elk, WW, Gest Sive=Us 4, memiclom Steice—0) 
[mete (OC Ween, ? uaa) py te te elalial 

Prine (geese, Vai) yetese) 

# Feature Scaling 


sc = StandardScaler () 
TIGL = OSE MIe T eee POT OK Teen) 
AEC s Pea e ce. eran ome mmx eres b) 


MEIGS 1ereeisl ia) 

[Ses E VS AESSe | 

# Training the Algorithm 

KECGGSSSO = NeiuicClomlOceesckogresesOr (i) CSlilerOnSs=20, ieMmiclonl StaicSe=0)) 
MSCiSS SOM Ie (OX WEILL, Sy eee 

Ve loreecl S MOCmesSis jOmSCUyer (x “ESS e | 


(Owed ey Ole] ah) 





Go}RumghiM 


PAYTHONe Tero») = 





TOO, Joeyvelevs! erst jel 

WINS WNOMTOW? eve ino 

import seaborn as sns #visualisation 

anororee. wdencje None I ieee es (ole ji eed mete eed 
import seaborn as sns 

irom Ss ieermomoce | SSlece tom mse re wire cee Syolie 
from sklearn.preprocessing import StandardScaler 
from sklearn.ensemble import RandomForestRegressor 


a aa > DowitLOaG idee eO Dane Cs Valea ae it 
# NtbpSst//Abrive google.com tile/d/ lonw-uRXPY3xXl ZOxKRNASy Yl ho-€ym Ol7 view 
a a a ee ee a ee ee SE SSS SSS Se ee a + 


# The upload file from your system to use this mathod. 
fHeleets Costole toreColeboratony Ollame. 

Prom coogle. collab iiportst tiles 

uploaded = files.upload() 


TSS. AS 
Clee SIS = joel meee! GE (ILO, evices WO) (lo voelelsc! | owl I eniionioine ese Olea! |) 
# Here Soper.csv your file name. 


een cea cea ooo * 


# dataset = pd.read csv("D:/Datasets/bill authentication.csv") 
# This for System Software like to be Pycharm. 
dataset.head() 








PAYTHON: 































DMSOL toe NGes eas OC! 
import numpy as np 


import seaborn as sns #visualisation 





iMmjeieiee milerejelbens, Liles jenfolene cis) jolie 4 Nave Ue lal Seie ile 





import seaborn as sns 

Lien) Siileewe nt mMocel SSlecetOm wiMOOes Wiese Bese sola 
from sklearn.preprocessing import StandardScaler 
from sklearn.ensemble import RandomForestRegressor 


a la at lit eas Breyrgebierciel Eiken te 18 ay Ohi Sc it 
# https://drive.google.com/file/d/13nw-uRXPY8XIZOxKRNZ3yYlho-CYm Ot/view 
ie all cts alee ahah ct la ies gee it 


# The upload file from your system to use this mathod. 
ig ie malicy ele sulledb=n ne ig Giely-lejloiac mene 5 Oiqulimaier 







iB Tereliil, e(elere ls eloukever ahijeeneis | 16a Iki 

uploaded = files.upload() 

MSGi: tc als © 

Cees = (OC. meee Cay (1.05 evoSss LO (ulloecoc | Joi Ll auiuiequnce Ors es yr” |) ) 
# Here Soper.csv your file name. 


i Oblsehee — jac isecl es (De Dercsesinsy les iil elrelisimekeciciem, es! ) 
# This for System Software like to be Pycharm. 


Ieiaisiercte mele; iDelerel tee | ts Ghialiliete) 
= dataset.iloc[:, 0:4].values 
dataset.iloc[:, 4].values 


MK XM FE 
| 


eel, OS IWS We eit, 7 bes = lems ost Sele Ok, Vy 1SSt SimeaU 27 Tainclom Seer!) 
OMe IGE OK ese Skil, YG aa! ye je tetslaL al) 

jie@nitie (XS ESS ec, im Val. wy icesic) 

# Feature Scaling 


sc = StandardScaler () 
I EISSN =) NO SE ALIS EI UNS EOI. IC eel LIN) 
S SSIE = SIO AEIceMS EON OS ISsie) 


OreIME (OS Eee TLIO) 

[Oe TINE (OS IS Se) 

# Training the Algorithm 

WSC iss SO = INeiicClomMores CNSC meSSOre (mM SSiimieons=20,, Ievaichom S ceceS—0 ) 
IPSC CSS SOM SIL IOK Cicelijgl, AP ie teen) 





Vi (eC = REC MessOr Credle (OK ees) 


print (y pred) 


Decision Tree Regression 


A 1D regression with decision tree. The decision trees is used to fit a sine curve with addition noisy observation. 
As a result, it learns local linear regressions approximating the sine curve. We can see that if the maximum depth 
of the tree (controlled by the max_depth parameter) is set too high, the decision trees learn too fine details of the 
training data and learn from the noise, |.e. they overfit. 


# Import the necessary modules and libraries 
import numpy as np Decision Tree Regression 


from sklearn.tree import DecisionTreeRegressor * max _depth=2 


import matplotlib.pyplot as plt : : a 
c ata 


# Create a random dataset 

rng = np.random.RandomState (1) 
4 
Y np.sin(X).ravel () 

Neleee | Sa ei or iaelohsecialel | Ino) 
# Fit regression model 


ey SONeIE |,  aeiMleniecialen(tcHUl IN) epee 0h) 


deere I = WECMSl OMIMnSeiNeo es sO (ile Cleoicln— 7} 
See 2 = WSC SiCMy Me SaINoG TOS eos Vilebs Cleon =S ) 
mScie Iba ic ik, Vy) 





mecie 2a Wek, Wy) 


# Predict 
MISSI; = MOcawemce Vl. 0, S.0, UWL! |e, me mew | 
yd, aesiepie IL ores 10K TSS) 


y 2 TCE 2 pOIISICUUICIE US IESISE | 

# Plot the results 

Pee aque) 

plt.scatter(X, y, s=20, edgecolor="black", 

c="darkorange", label="data") 

euhe RO WES Sic mae, gee © LOG Le Ori mOWieTO Ue, 
altel itey< cleocia29 I itenrnchah 2 | 

IHRE SOIC ES USS p Vy 4, Colom vellibomeinecin, lvalelaMine< cleyockimo", dane clelie2 | 

(en <ILaloe | Scleres y) 

plt.ylabel ("target") 

plt.title("Decision Tree Regression") 

plt.legend () 

. show () 
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Multi-output Decision Tree Regression 


An example to illustrate multi-output regression with decision tree. The decision trees is used to predict 
simultaneously the noisy x and y observations of a circle given a single underlying feature. As a result, it learns 
Koler=\ i llalst-lamas1e|actstsylelarswr-1e)e)cep.<iaar-illare maal>meq]ce)(-Mmn Acc mer-lamciolom ar-im lm lal>maat-bd[anlelaame(=)e)iame)mialomicot>m (ee)alige)i(-1em ey 
dal=¥an=.@mke (=) 0) 10mm Ol-le-laal=1(>10 Mm lsmcy>) mole m alle ammial-me(=\e1[-y[e)amigo\-\om(-t-1eam (elem ila(>mel>ir-li ome) mialcmie-lialiaremer-tccm-laem(st-laa 
from the noise, I.e. they overfit. 


iM oeneie, imbiye, eis. ile 
HLNeIoa es, “Mee olor Lakle sien qeliow, gis) joie 


from sklearn.tree import DecisionTreeRegressor 


# Create a random dataset 

ig Mey —= ie), saeligvelevils eveliaielouns ieelizS (IL) 

pe = gers Sencie(OZIOM) “ws “ieighe echaye: (MOO aL) = INONO)F ep cals 10), 

Ne gle elicigelwel( ligi@sjoal ~~ eyes Saba c teers) 5 isjesieul, = OG e@OR IS) sisenvell () | ya 

leas Sp eee — Oe ee Chee ounce OZ es) 

# Fit regression model 

gece 1 S JOSCLSiemiWeoelnag cessor ime>< “cleje cl—2 } 

GaCne 2 = IDSC LS aCiIiceolnoc¢ messi Winer. cleje clas }: 

eC ie 8 3 IDSC ILS il Cilicellneg css Soe ile < “choo cin—s) 

SCie eth Oxy, 7) 

Ce Ze ICME IS Ny) 

SSCS SGI NIK Ay) 

# Predict 

iM ISS = mo sekesiice (LOO, MOOG, O01) la Wigs iden S| 

we IL Ss feeene IL sjoiseenbee US (eS Sic), 

We = Teeene 42 a jOre Se USE (OS ES Sie), 

va ISIE pe) Ss) LS igioe NONE (OS NESE), 

# Plot the results 

ple. figure () 

Ss = 25 

Dilueescakteernily ls. Uli yay, ee Navy s—s, 
edgecolor="black", label="data") 

Qe aSieciwece (sy len Ole See Jb, es corene bonis aolthe ) “Ss, 
sclejoioo elie odlevel<, dhele inte elois ei = 4 )) 

ie asec wsie(yi eal ee Wi Seibel te igeel Soy, 
edgecolor="black", label="max depth=5") 

pBlt.scatter(y 3[:, 01, vy 3l:, 11, c="orange" | S=s, Multi-output Decision Tree Regression 
eclgecOlor=Yollaci’ » ele “iters Close i= 3 7) 

(Odie 5 eae Pte 4) 

(Subis evant [Gs ts)))) 

pits <label( barge. 1") 

DIt.ylabel (“target 2") 

plt.title("Multi-output Decision Tree Regression") 

plt.legend (Loc="best") 





- Show () 
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max depth=5 


max_depth= 





ma(O)Mtals me (=yel[o}(e)amcl0 | arslersie) mrome(s\e/k>)(0)amigsismelammial=mlalsmelsitckers)| 


Plot the decision surface of a decision tree trained on pairs of features of the iris dataset. See decision tree for 
more information on the estimator. For each pair of iris features, the decision tree learns decision boundaries 
nakc\o{oWe) mexe)pele)iar-uilelarsme)mciinnl e)(cmialacxsvave)eliareMaelictomlalicvaccvemiceleamialomieslialiaremct-tea]e)(otcmm Accu l sxe) ale) Wmlalcm acts 
SJigU Lei (Ulcome)mr- mm pnlele(>]me)0]|/melam-li mem lal-mict-llUl cots 


LMSC Ce MIN BS iS 

MMcorne, Wrene Olen Ialionenjollore evs: joie 

hie SiiSevemCleiccisors| Tinmleloree Ieee! aie 

Eom SK MSeiciaa ties MMSOmic ISG S LOMMliCcsC lLesSilicteie, Ioiher ieee 


# Parameters 

i GLAS SSs = Ss 

IGiNCKe ONO alow 
Ole ses = OWI 

# Load data 


es = eel eae) 
Lor pallid, soalnr sinremumenace (07 lil Ope Z ly Oye sl 
[up le ee ee eA eal 
# We only take the two corresponding features 
OS US Cate ai a Oe | 
y = iris.target 
ia Weer labial 
clf = DecisionTreeClassifier().fit(X, y) 


# Plot the decision boundary 
leo Suisio loc (A; BS), Weaisieb< =e 1) 
IML SC kel = OE UY Aimar) = 1, 2B Uh cee )) aie ol 
Vin, SP eS Sk amie) = ke he, Il) cate) Sp Ik 
Oe VV = Wyo nmeelig 16 (MO peliesincis (ox ML, OX ieb<, TOILE Stee)! » 
Ms SUSeLOCiS (OF MNIn, Ay iiiel<, Jol SIESIS)) ) 

jolkc aes Mestre ld ac. 5, WaG=0.S, Mecl2= 5) 
hy = “GIL SOLOS (INI se [ex menrell li, Wyaieennell |) | | 
Z = Z.reshape (xx.shape) 
CS = Olle COMEOUIe nox, Wyn “ae Cie oSolle ,ciis xenasu) 
fulbic o2xlelsel (eis. PSeciicS Weiss [jee || 0) J ) 
ube WLelsel (ies aeSeculiaS Memes [seve |) ||) 
# Plot the training points 
hove dy, Colom iim wziljoeemes (in Classes), olor coloms)) 

idx = np.where(y == 1) 

ub SCE TSIe( X|lacx, Ul, X[mex, Ll, C=colloe, laloel=121e.cemgst memise |i], 

cmap=plt.cm.RdYl1Bu, edgecolor='black', s=15) 


plt.suptitle ("Decision surface of a decision tree using paired features") 
plt.legend(loc='lower right', borderpad=0, handletextpad=0) 

Pile ec st(e sarin.) 

piles eeu) 

clf = DecisionTreeClassifier().fit(iris.data, iris.target) 

follcte emes(eli, ii llllecleiies) 

kc 4 Slaven) 


OUTPUT 
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maessimn ©) aul ali ale me istel(yle) am iaoxotsmn 11a mere}s) mere) gale)(o> diay elaulaliare 


The DecisionTreeClassifier provides parameters such as min_samples_ leaf and max_depth to prevent a tree 
1ico)ane){=10 i]t |a\e pm @xeysy mere) an) e)(=>.dinvam o)aelaliarem o)ge)v(e(-ssr-lale)ial>) axe) e)ile)amie mere) alice) mial-mc)y4- Me) mr-mitclo mle 
DI=Yorl\(o lal Ma=\>1@1(- tts) 11>) em tallow ©)ae lal |alemcsxevalalie|el-mlom or-le-lanlaii>]ayAocem e)ymmtal-meress) mere)ae])(>> <i nvm Or-le-taal=)\=) emreve] OMm-1/6)aT-E 
Greater values of ccp_alpha increase the number of nodes pruned. Here we only show the effect of ccp_alpha 
regularizing the trees and how to choose a ccp_alpha based on validation scores. 


Moos wie CO Mo ILile.(e5 jodlkeis ess Vole 
mom Sic hSeren gihoclal evs Mec Oy joie Teleco ISeie syd ce 
nom SicikSermei clecescies aimee Ieee IdicScs ec Calimesic 


from sklearn.tree import DecisionTreeClassifier 


Wie SP = dL@rele) Jongerevse Ieee (cena 3< ye ebiS)) 

IK Ieee, OX SEE, (7 Eire, (7 ESSE = Ticelil Ges Sspieil<, W, Memeo Siteico=!) 
Clie = Wee lonlcee€ Lassi. Ler (renciom Stee) 

Secin = Clik COse Comolli y (eletiaiee, evel “Eieein, Vy Eieeulin) 

See elias, IijouicLenes = Pacli.ceo aljpomas- Perl. limouUrmitiLes 


ip Megs wes = joukic seulejoltene i), 

avec oLoci@ce elias |e=l), illemiencies |t—l ||, Weker="O"', Cheamery lea" Susos=sOSin” | 

ac See <leiOSiL | Soirireeiiwe elliolney) 

eco SSc Vileloeil(eocel wim OF Jenne +) 

eo SSG Eee Motel mieticey We SminSscrivS eilels wore ihiesyiiinc) eee’) 

Text(@.5, 1.8, ‘Total Impurity vs effective alpha for training set’) 


() U T! p U T! Total Impurity vs effective alpha for training set 
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Gaussian Naive Bayes 


CETUSSIETa N= mlanlel(ciaalslalicmlalomerclUrsssytclamNIcIAVom stohiictomelioce)aiialeame)mert-lstiiler-l(0) ap 


coil Bicbkelavicnaleice Sees: sileoie ce lbeyeyel_ aliens 

1eOMN SIMSSVe MeO SSLSCr we AMO: Tee aid Hest Selle 
OM SCLSee ames ISayios qlee Eels sales 

My Sy = dole, abiedss (ieSicieil x. ya lhebKe), 
MM eieellin, OX TSSSIE Wy (ise, Wy Cesc = wiceuio ies Solwiels; WW, wesc Silw~e—W55,, iceimeleml sitclce— 0 | 
gnb = GaussianNB () 

Wy Jereerol opal tea CK ILI My ye Ere AI ZLIOL))  JONGSIOIICTE (6 ESSE | 
[Sigaliaeat IN ielilefeia wee sisi sjikeve) Nolen ane se weyble Ten ch weil se) \eeule ers 


wo. @ (US (esis sloeyoe (Wi, (yy cesic '= 


Vaeeee es Um) 
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Naive Bayes Classifier implementation in Scikit-Learn 


from sklearn import preprocessing 


Number of mislabeled points out of a total 75 points : 4 


cmt Shoevet. Meise loaves Wings CawSsaeils 


def getWeather(): 


revura |" Clear’, “Clear Clear’. Clears ' Clear Fo “Clears 
Oselingve) 5 Uxcubaie = lneobow = Vikslibi@a yy sv ieclilgiyy & yisculinn a 
SRO = VISIRMONIN EE  isyAOe! a SNC A. Saloni 2 Y siaouny || 
def getTimeOfWeek (): 
return ['Workday', 'Workday', 'Workday', 
'Weekend', 'Weekend', 'Weekend', 
NiNiovelselaiyy, — iNenaixeleny 7 5 / Nitereldclelse! - 
'Weekend', 'Weekend', 'Weekend', 
iNonalelewiy a Jena deleeiu | ViiNevaldekay sly 
'Weekend', 'Weekend', 'Weekend' ] 
Ja Cec Wns Day () 
ievmibmege) (. Meiagalgen s. /Ainbhovelah po Gcelanigven ry 
Nlenaralien = I bsiela A Sin eigaiaker. - 
Honea op: A wll ohaielge ye Mian feialake, 
WUlosegaleion  Mluigrelal F “dnicerguiaves! = 
Morning! a kunCch ts  hvening 7, 
tenailaligley A valli bhavela Fe Wianyfesolal even 


def getTrafficJam(): 
revcurn. |) Yes'*7- "Noy svYes*s 

Ot OL eae lo: 2. 
Vesa oot we ves. 
OF ae Oey, ee hlOn 
"Yes', 'Yes', 'Yes', 
eC 7 an NOw we vese 
! 

# Label Encoder 

weather = ['Clear', 'Clear', 'Clear', ‘'Clear', 'Clear', 'Clear', 
Neyslaliety “Vigvelibiane 2 \ lagelabinye 4 | Minclaliony , Winstabonye | \ levelaliany yy 
ASHOM NY A 1 SIMONA SIaNCINa A VISION YY SulGiiye 2 Sites | 


labelEncoder = preprocessing. LabelEncoder (); 
(iio i heloiS amCOckee sic11e Ie reviS ic OmMh | WSelc. ere) | 
Evatt red ames ves. wee O mer Les 
'No', 'No', 'No', 
‘Yes’, ‘Yes*, ‘xyes’, 
Nie SINTe) Fa Nein r 
Ves, Ves ee esas, 
"Yes', 'No', ‘Yes' 
! 
(Over IOe | hele Mie Sie 4 I EIN S OieM  IEeelir Ie eet) | 
<a sos Training the Naive Bayes model -------- #t 
# Get the data 
weather = getWeather () 
timeOfWeek = getTimeOfWeek () 
timeOfDay = getTimeOfDay () 
trafficJam = getTrafficJam() 
labelEncoder = preprocessing. LabelEncoder () 
# Encode the features and the labels 
suicoccoitioacheir = lelse limine@celor «ib Iie Sir Orem (WweerclaSie ) 
SnicoceclimeOriiesk = Leoelimcocdsr. fit LreiSrOri cImMSORINee x) 
sncoceciliimseOrDay = leloelimcecer rit EremSrOmm ( cLmsOrlay) 
SMeCOCSCl Meet Ledeml = levis LaimMCOchei 4 ihe cies eS Oeil (ee stie aul ein) 
# Build the features 
features = [] 
for 1 in range(len(encodedWeather) ): 
features.append([encodedWeather[i], encodedTimeOfWeek[i], encodedTimeOfDay [i] ]) 


model = GaussianNB () 

# Train the model 

model.fit(features, encodedTrafficJam) 

(p= Sema Sa asses subline, Wem: Inches acne joiscliGiiegs —=—— 4 
fol snowy, Workday 7 Mermaid. | 

jouenkiene (ieee Ik joucercnverc (| (Ar ily 2) 19) ) 

# Prints [1], meaning "Yes" 

te sengl  WinieSel ese inpotayela | 

Princes (model predicr (ily 0 ei) 

# Prints [0], meaning "No" 


OUTPUT (101000111000111103] 
[1] 
[8] 


AG \Vi(crelalsmes|Ulo}(s) al ale melamialcmarctare\ialicclamellelitcmercire 


HMoeas mleicjolmenc Io jovjolec els jols 

from sklearn.cluster import KMeans 

from sklearn.decomposition import PCA 
INO IMINO eS ie 

incom Slcveeiem Cleese cs tite~rse IMerelcl clue ales 


date; lalosile = dkOeicl Cagis (meinen 2 vals) 


a Semnoles, id IeSerciices)), mm ClLGiES = Celceqeiieioe, Mo. UlaIe ue | lalteils) sSiae 
KeClicCeC Cece = ECE (in COMOOMEMCS=2)) IIe CMe S inom (Cel ce. 
MMEeMSs = INMEEMS (Lidsie="amieeme > im Clie cere=n Cligiies, i alae) 


SMS INS 5 TIE (ISSCHGSCl levee) | 

# Step size of the mesh. Decrease to increase the quality of the VQ. 
he= 02 ie [kOuLinue Alig) SeiolS. iercla: |< dalink F< ie] cy neha Ve Titles] > 

? PkOrg thewdecius On) boundary FOr eEnak,) we wi limacssitognya Color =o each 


i Mii, Of Mex = Isecmesc cClaicalls, Oil cmiml)) = i, meciese Cecala, Ol) aimee) a 
Vil, Voie = meculcec Cecalle, IL) sma) = I. mecucec cecal e, I) <ilex() a 1 
Op Wi = TO MSS MICS (MO >SWeiigia (5 ML, XX Whee, In), MO-aAmeiCt (Wy Iti, SP Milee<, lm) 


# Obtain labels for each point in mesh. Use last trained model. 
A = ISM SEVMS oOICSC ICE (ioeC | eoxsehrel i, whips wenrel |) I) 

i tebe elie me cbdlic. imeieen  elellzena jolene 

Z = Z.reshape(xx.shape) 


(ole, wake nehee. (IL), 
[elewiodkie(() 
plt.imshow(Z, interpolation="nearest", 
SC eSe— OMe SOS oleD ep SAyoulanian |) a) Avie) )) p. 


cmap=plt.cm.Paired, aspect="auto", origin="lower") 


Lie nO Lot (mecboscl Cece | 2, Ol, mecticed cle |e, Il), (K.%, wemikeesiZe=Z ) 

# Plot the centroids as a white X 

Cemncrolds = rieains. Cluster Cenmcers _ 

plt.scatter(centroids[:, O], centroids[:, 1], marker="x", s=169, linewidths=3, 

color="w", zorder=10) 

plt.title("K-means clustering on the digits dataset (PCA-reduced data) \n" 
“Centroids are marked with white cross") K-means clustering on the digits dataset (PCA-reduced data) 

(OMe TIM WIL Sx ee) Centroids are marked with white cross 

ilies ban" ial. 7 ike L Se he 

Clie ean Clesr(a( iy) 

jodie 4 ye tesa (( hy) 

5 Slaw) 
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Oxo) an] ex-laisyo)ame) im lato sal (srolarom-lale mV ilall=rolkelal a@\VicrolarsmesUloiicvarare 
reltere)aitalaas 


We want to compare the performance of the MiniBatchKMeans and KMeans: the MiniBatchKMeans is faster, but 
gives slightly different results (see Mini Batch K-Means). We will cluster a set of data, first with KMeans and then 
with MiniBatchKMeans, and plot the results. We will also plot the points that are labelled differently between the 
income le fel ainalaatcy 


import time 

TM ONCIAE TMMUMNeN, eS ioje 

iigecwec, Whe celle lisa ogelliee eis jolie 

from sklearn.cluster import MiniBatchKMeans, KMeans 

eo SiC Sec e Met GS sie Ligier GMO OIee JOIN Se ClLsiwenaveS Ss) elec iain 
Bien lcvoeveoniclevecisisee! Aiioerer. mMeilicc: lolloler 


ft FREE HE tH HEH HH HEE HE HEE HH EE OF EE HH OE EO EE HE EE OEE EO EE tO EE EEE EE EOE EE EEE tO EE tO EE 
# Generate sample data 

np.random.seed (0) 

IGelieln, Silver = ANS 


@guecnes | ee Se ai a ee 
ih SCuUL MNS ictene Ss = ILS (Caieeigs)) 
Xp Aes Tells = mele Iololos (a Senqoles=suU0, Caincers=—Ceniecs, CGllusiee seca.) 


ft OFTHE THEE FH HF HH FH HEF tH FH CH SF OE OF HH SF te SE EEE FE OH FE tO EEE EEE EEE EEE tO OE EEE EE EOE EE EEE OE 
# Compute clustering with Means 
Kk iecine = KWINISENSe (Ie KSMee Ae, mh Cllueieeies=s 4 im ainsve= 110) 
tO = time.time () 
MSOs 5 eae (xX) 
i lewd = jest s eres 1) ele. 
ft OPT E FEE FHF HH FH HE HE OF CH SF HO FH HE Ft EE EEE EEE FEE FE eC EEE EEE EEE EEE Fe E E OEE EEE EE EOE EE 
# Compute clustering with MiniBatchKMeans 
Mok = Wiese MAM eins (ities Ioaiseiisir’ p i CluSeSits=5, locliiclhl Si “ve=oeechl = iue, 

1 MNES LO, MESS iO Minscrovenient=l0, weoose=C ) 
tO = time.time () 
MON Gama eS) 
ilo, Ieee = “Elis erie ()) = 26 
ft oTPTHEE FEE FH HF HH FH HE HE FE HF FH OF HH FE EE EEE FE EE FE CE FE EE EEE EEE EE EOE EE FF tHE OE EEE EEE EEE EEE OE 
# Plot result 
fig = plt.figure(figsize=(8, 3)) 
migscswWloolocs 2cyusetlerca0,02, io1gic=—0,2o, ioeimeom=0 U5, “ECU 5 2) 
colors = ['#4EACC5', '#FF9C34', '#4E9A06'] 
# We want to have the same colors for the same cluster from the 
# MiniBatchKMeans and the KMeans algorithm. Let's pair the cluster centers per 
# closest one. 
K VMeeNS Clu ceic Ceuceies = I< mises .Clusiceie Ceimimeids 
Circles = jIeuliehss CiSitences eel (|< Woes Clusivom Com coms 

Oe SOIL WISiCes (OSes) 

lel ilSehos) (lLpisicere Cenineies: — lol (Clusia @Sinceics tciscleie) 
Kk Wiens lelele = joe eyes Cistanees ercqmin<, Is meee CluSsier Ceinceics ) 


mbk means labels = pairwise distances argmin(X, mbk means cluster centers) 


# KMeans 
eve = Inne, SUlejoior (il, <2), “h) 
mer I, Ol jin waioleiice (ir Clues |, Colors) ¢ 
iy Nemloeics = |< iileecins Ieiosls SS Ik 
opliWisieSi6 WelSigicinng kc inevehgis! esp suena (elsieicSuatel |<) 
eve louie (OX iy iloulesies, Wi), OX [my memlecies; iL )iy Nyy 
markerfacecolor=col, marker='.') 
eye -jolliore (ie IMbIENCSI= scimcere 0) | 7 elibisccie comcee |, 'Or 7 Wel seicreleoeo or—Co I. 
markeredgecolor='k', markersize=6) 
ep SIE eae WON ili ass 
26S XSILCIKS |) 
Bx, Sse VIELE I ()) ) 
Peete | — oom 6 me enna te Meme oe Sorbie Ib ele dra 


1 Senseo, I< Seis imei sel | | 


# MiniBatchKMeans 
ee = Ted nelelol SUlejolor (ip, 2, 4) 
Or I, <ul aid waiiolweiaciS (in Cllusiveics |, Colors): 
iy Memos = mlole jmieecicnss Leicels = 
Glbis cee TeSiivicig = ile unseuels: (ellivieicicie erie Sie) |<] 
aie sOuLOre (08 iy toile ses, Ol, OX [my imomliecies ib) weg 
markerfacecolor=col, marker='.') 
2o<- Lom (Celilo icemccie 0), Clic @cmocie ||, 'O', mere ccieoco Lou oo lL, 
markeredgecolor='k', markersize=6) 
aX. SS CieS | MU GuUbe re Ginitioe tas ! 
2X SEI XIEICIKS |) ) 
BUCS SSE VIEWS |) | 
(Sue Sere Soy Inc cies Meh Jelena sy Va heleieinis ts cle) 2. 
IC IM, JSeselip imlolk seer sel) | 


# Initialise the different array to all False 
Culibimemeie = ilo mises Leleels == 41) 
a = ibieseielc Sulojolon iil, o, 3) 


how I< tid werner ia Clusiceics )) ¢ 
Oibirecsim. w= (Uk meeos lalels == Ik) t= lino tmeems leloeils == Ic) ) 


Cheer te = idler ogee ik ior | cll iriksiceie,) 

ep ea ielhoc (Os nceinewe 0% Jlveleignewies hye nes, 
markerfacecolor='#bbbbbb', marker='.') 

ax yO UO Cielito renty Ol, Oli ih toren igual; as w > 
markerfacecolor='m', marker='.') 

Bos SiSie Tei Di reese 

BX, SSI IEUCIKS ((\}) ) 

ax.set yticks(()) MiniBatchKMeans Difference 


OIG 5 Saou C 
train time: 006s train time: 0.065 


e U T Pp U T inertia: 2470:583458 inertia: 2461:442870 





Hierarchical clustering: structured vs unstructured ward 


In a first step, the hierarchical clustering is performed without connectivity constraints on the structure and is 
solely based on distance, whereas in a second step the clustering is restricted to the k-Nearest Neighbors graph: 
it's a hierarchical clustering with structure prior. 


some of the clusters learned without connectivity constraints do not respect the structure of the swiss roll and 
extend across different folds of the manifolds. On the opposite, when opposing connectivity constraints, the 
clusters form a nice parcellation of the swiss roll. 


import time as time 

import numpy as np 

ec, Mec oleae jovioloOe <els jolie 

MII UNI IWOOLickcs sme Loe sicljeres Sel Els is) 

from sklearn.cluster import AgglomerativeClustering 
AMC Skee Ceceseus Nicer melke Srales moll 


# oOPTEE EH TE HH EE EE EH HE EE EE EH HOE EE OEE EE EE EE EH OE EE FE EE EE EE OEE OE EE EO OEE EE EEF 
# Generate data (swiss roll dataset) 

i, Semoles = bSwl 

noise = 0.05 

Ke = lee Sucles OIL in Semjcles, mois =C1 Se) 

# Make it thinner 

Gee oe — ae 


# OPTEEE HT HE HH HEE EHH EHH EEE EE HE EE EE HOH OEE EE EE EEE EE EOE EEE EE EOE EEE EEE EE EE EE EE 
# Compute clustering 

Pernt COMmpuee suns er ucbuecdemicCrarehtcaleeciister ing... 4) 

st = time.time() 

(meliael = NCIC OMSiceren Wel MUSES Ie Line i CUS Ieies=, Iiikeeio— Weieel! 5 IeM5e (08) 

slepSssc Cams = ime. tts |) = Se 

Ikeiorel <= Wells Melee lis. 

(leatone (bevosercl ititle>$ ep Zins 3 Sleleisiecl ice) 


ie) 


|Siakighe (Nieiilecne fens jejoukigues 4 “oils Ikelelo i isulyacy 


# OPTEHE EE HE HH HEE EE HHH HE EEE EH OEE EE EEE EE EE EH EE EE EE EOE EE EE EEE EE EE EEE EE OE 

# Plot result 

fig = plt.figure() 

ax = p3.Axes3D(f1ig) 

eo Waem late, 10) 

FOr ins mo sum euie:(rabein\e: 

ax.scatter(X[label == 1, O], X[label == 1, 1], X[label == l, 2], 

color=plt.cm.jet(float(1) / np.max(label + 1)), 
s=20, edgecolor='k"') 

Pubes tease es MAL EINE ICO CMSC TIN Ey COMSICIceings IitsuNe ae Zis)) a tollenoe sil Iams) 


tt THEE HT HHH Hit HH Ht HH HEH EEO OE ESE OE ato oa aoe PoE ESE PSE SEES SEES HES aoa ESE EEE EP 
# Define the structure A of the data. Here a 10 nearest neighbors 

ERO SLE n Mec moors Tio Kinteieidloo~s creel 

SOnMSeC EI ey = itereilsors Creepin, © merGliocrs=10, nae lucle selirare les) 


ee 

# Compute clustering 

(henlione (MCouie Wiese Sieiciblkeiebhalciol Jelshovas nae olikere I el hbhs ersuaabialery fey) 

st = time.time() 

Weneo! = Acie lice tc 1 eC Inne Sie Ling (it CIS cers =o, COMMSCEI Lily =COnnNeori wey 
linkage='ward') .f£1it (X) 

SicioeSol ili = “Elms. rile |) = Se 

leVoreL = yivetietchs Wellore ite) 

[OreaLiqe, | y leyoescl Times wa 2S = elheoscicl Icaiis), 


(e) 


iSueaione (MINhu@ileioie tenn jeleaties.2 “cal aa JlsileSll salu, 


t OPER E HE E H HF HE HF HH HF HH EE EE EE HEH EOE EH EH FE EOE EE EOE EH EOE EE ESE EEE EEE ESE SE EO ESE EEE EE EEE 
# Plot result 
fig = plt.figure() 
ax = p3.Axes3D(f1g) 
asowiew dide dy, ell) 
for 1 in np.unique(label): 
ax.scatter (X[label == 1, O], X[label == 1, 1], X[label == l, 2], 
eollene— ole seul ase Ve lesie( ll) 7 inj sitesx (ibeloeil se il) ) . 
s=20, edgecolor='k"') 
(ULE ae lel Wiis, Comiscimiwiie, Comsiieciiaes (ells oo 4irs) / @ Sileyoeocl ims) 


[Scbis . Slavery | 


OUTPUT 


CoijewkheS> wheysneicbicie tkasel Isavsiechaelguie@sllh Tells cuasbigters 4 ¢ 
Elapsed time: 0.08s 

INiileere Tee jeronlievecss  lsiei0 

Coie bhes tree bieebhacvel lguuisiasheelsavesi Ih ellis erie sl ipver 5 ae 
Elapsed time: 0.14s 

INiciileeie Teie jolene 4 sie 


Without connectivity constraints (time 0.08s) With connectivity constraints (time 0.14s) 





K-means Clustering 


The plots display firstly what a K-means algorithm would yield using three clusters. It is then shown what the 
effect of a bad initialization is on the classification process: By setting n_init to only 1 (default is 10), the amount of 
times that the algorithm will be run with different centroid seeds is reduced. The next plot displays what using 
eight clusters would deliver and finally the ground truth. 


(NOs, IMMUNO eS idle 

IMUYOCIE whevcelloellals jox7olor aS jollic 

# Though the following import is not directly being used, it is reguired 
ie Ni@ie 6 Dh (once poiere lei Wena, ienal< 

ETE JMO, WOKS LILIES aM Oe Stel moors Byes). 

from sklearn.cluster import KMeans 

from sklearn import datasets 


np.random. seed (5) 

Wels = Cetesces . loercl Wes |) 
X = iris.data 

y = iris.target 


SScivie tors = [lk Meeine Mrs 8), Means (in Clhustteirs=2 |) 
(ie Weer Wiehe 2 5 eis (id MUS eSies=S i) 
(US meee iets (devel agave’, IMeelias (inl CIUSinSise=— 5), Hl aidan, 


igube— acheleloiies) h4| 


fignum = 1 
ticlest—= mS -clisters 7 SkClusters ! — 3 Clusters, sbadein rial azabron || 
for name, est in estimators: 

fig = plt.figure(fignum, figsize=(4, 3)) 

ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134) 

est. fit (X) 

lect: = Sse hele 


abc SC chee 1s Kale aes | Nae Oh |e eae alice are | 
c=labels.astype(float), edgecolor='k') 


Ex a OSEVKILS SSeS c ELK leloo le | [)| ) 

le) 
Biol ZevcuSs 5 Ser, enekielosis (I) 
axoSee >< lelloe ih ee rel welche! | 
aco Sec Wieielh | See, Meni ral’) 
eben aleioelk( escent Neier?) 
eco siete Teme de ieae os |e eo, = |) 
ax. dust r= 12 


fignum = fignum + 1 


([ 
excl Verciss Sith, Cell aloe Iie |l 
al 


fig = plt.figure(fignum, figsize=(4, 3)) 


ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134) 


forename, = label in ite sevesa a0), 
( | ae) ss, 
( iG 7 Alle 
ax.text3D(X[y == label, 3].mean(), 
X[y == label, O].mean(), 
X[y == label, 2].mean() + 2, name, 
horizontalalignment='c« erty, 
bbox=dict (alpha=.2, edgecolor='w', facecolor='w') ) 


WP = elo arelgrevesisias ile eae C0) ers iejers (imilereic ) 
axe Seabee (X77 ol, i le Sl cle ey, edoecelor—"k)) 


) 
) 
) 


eco) Xerchee Sere wae l<ibeleis jis! || | 
axa Verse. Set wie laloe ls | | | 
eso Weve Sie el clheles this (| I | 
evan See o<ieiooul ( ) 
ax.set ylabel ('Serx a ) 
ax.set zlabel ( | nk ) 
exo See Clebe | emeowiel Wie ela” | 
ax cCier = 12 

Pg eso | 


OUTPUT 


8 clusters 


3 clusters 


= 
nH 
c 
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~ 
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Ground Truth 


Virginica 


Petal length 


$e Bt sicolour 
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Petal length 


Petal length 


Compare Stochastic learning strategies for MLPClassifier 


ai altsm=>.¢-lan) e)icmutsiurclipActomcye) palomieslialialem (Osstsmere| avcctom (e)mmelliisaclalmsiielear-tsiilom(crolaallare moligcticve|(otsem laces (U(e||ale mele | Dm-lale 
Adam. Because of time-constraints, we use several small datasets, for which L-BFGS might be more suitable. 
The general trend shown in these examples seems to carry over to larger datasets, however. Note that those 
results can be highly dependent on the value of learning_rate_init. 


MANUS elie alaliaiers 

MMos mieleelkerclalonjoxfollew. es; jolie 

IIOMl Sl ese silowicc isco |< aieoiee MINES Weis Sal 1 ete 
from sklearn.preprocessing import MinMaxScaler 
from sklearn import datasets 

from sklearn.exceptions import ConvergenceWarning 


# different learning rate schedules and momentum parameters 


jSevecwis = [| Seley | “seiel’ , “USE acca ecu’! “Comsce ve. “lense ug! & iy 
eee aim@alLiqvey iaeliee cine (0) 304 
i SOlyeic s viscel’ 7. » Loehcd ie) “eee 3 YGCORSreiey , “Meme meu oS), 
Niger Gevcionns. inenvemewmt + Idellisie,, 4 Ikhoeiteaime jess: shorlie 3 a2 |: 
(sxe ycne. 2 sxejel fs (kochad gic) ache BV clomieiceime yf nnelMicene lie See) 
Nidlsis CercOvrs linenmeione bil’ 8 Wieiie, ) IvSeiiieuiLiglep eekete Nite = 10.27) (i 
(aS OVC trae SOG) eee ciel CQ pnicclibe same iis @el liye OMe Tet stan), 
lists dena, eevee: Mei ye (a | 
(esolivene 5“ srecl) , “ eeliigaiing, iecleS* 8 Validea hie ys Vuleinieiniotimn: = 5), 
MUS ESOS  nevenic uM 2 Mable,  IvSeiinieG merce Mines a2}. 
Lo Stollwore 3 wisroieh’ ,  WiSenagulinicy Seco Bs alivercer Lig 5 Yinlolmsinewinl 4 15 2), 
NSS Beisons jilenleqieumi 3 telicis, + liSeleqmes tess hm Ne! 8 5  j, 
te sOlweie! = “elelemml: 5 Viyeiieubigie sestce, aoe 8) IL) | 
labels = ["constant learning-rate", "constant with momentum", 


"constant with Nesterov's momentum", 
Holi @higtsyeieHsbigvep Ucvehiaiguhiglof acim | aligiye—cye)> llniavepriimkieis| iene oui - 


"inv-scaling with Nesterov's momentum", "adam"] 


(ore eves, = | e's “oc, “llatosievine Ss Y=" ie 
{'c': 'green', 'linestyle': '-'}, 
ae biue 7) lanes te yile te =a) 
ee! PeeGl sy wolunesty kev. 
eae foreen',  'linestyle™ "==", 
janes ‘blue’; ‘lainestyle': "==" }, 
(Coe Oda C Kenmare Ve cee lal al 


OSI JOLKOe TOM VelehnciSiS IOs, “We eben iglelnite)) 
# for each dataset, plot learning for each learning strategy 


print("\nlearning on dataset %s" % name) 
Sho SEIE Tete ee) iene ) 


X= MIAME S Oe Leie |) oA Ie ieee ie Cem) 

mlos = [] 

if name == "digits": 
# digits is larger but converges fairly quickly 
les Wiese = hs 

else: 
max iter = 400 


for label, param in zip(labels, params): 
jOuedigieal eageuidaligies est! > Ieee) 
Wij = MLC Lasse Lier | eich sceice=U 


MLSE LCS SMEDS LESS, “SS ICeln) 


# some parameter combinations will not converge as can be seen on the 
# plots so they are ignored here 
with warnings.catch warnings (): 
warnings.filterwarnings ("ignore", category=ConvergenceWarning, 
ere WU» Siclecicads | 
MNO stb IE OK, see 


mlos.append (mlp) 


fe) [e) 


print (“Training set score: ci". mip.score(x, y)) 
redline Weenie eae lOse3 city c los loss: 
Oe MUO, Lele, eices sis mois, Melocike, joule aise s,) 


ix o LOE (mln LOSS CLligve - Jeli ele ly “eles 


fig, axes = plt.subplots(2, 2, figsize=(15, 10)) 

# load / generate some toy datasets 

ies — Clelcesiecisa eer mies) 

x COCHIES, S CULGILS = Caresers. Oe Cliguis (serie << Waleue) 

Cia Ste = | ics eee, Tel Ss bee Ciec)) - 
Vs Cages, Wh Clues) » 
SALASSES MAKS CMeCles (OLSS—U52, RacTroc—U.5, encom Stere=l | 
Cle CeIS ES siMallel MOONE NCL SSS 53, ieenclo Sicece—() |! | 


IIe ek, Ceitel, Memes Lil Zajoleees, evel |, Cee Secs, [wine , “GuLgaee’ 
eine glkeic os | ielears’ |) = 
Ole Ol CeNEeIseir Cle icel, el=e<, Meliie=waelilte)) 


if Wefe (eve sine) le acien. Nites ji, Meloele, inecl—s, ees ecer Cenc) 
ers newaly) 


OUTPUT 


learning on dataset iris 

training: constant learning-rate 
Training set score: 6.986600 
Training set loss: 6.696958 
training: constant with momentum 
Training set score: 6.986600 
Training set loss: 6.649538 
training: constant with Nesterov's momentum 
Training set score: 6.986600 
Training set loss: 6.649548 
training: inv-scaling learning-rate 
Training set score: 0.366600 
Training set loss: 6.978444 
training: inv-scaling with momentum 
Training set score: 6.866600 
Training set loss: 6.563452 
training: inv-scaling with Nesterov’s momentum 
Training set score: 6.866600 
Training set loss: 6.564185 
training: adam 

Training set score: 6.986600 
Training set loss: 6.645311 


learning on dataset circles 
training: constant learning-rate 
Training set score: 6.846600 
Training set loss: @.661652 
training: constant with momentum 
Training set score: @.946600 
Training set loss: @.157334 
training: constant with Nesterov's momentum 
Training set score: @.946600 
Training set loss: @.154453 
training: inv-scaling learning-rate 
Training set score: @.500600 
Training set loss: 6.692476 
training: inv-scaling with momentum 
Training set score: 6.506600 
Training set loss: 6.689143 
training: inv-scaling with Nesterov's momentum 
Training set score: 8.500600 
Training set loss: 6.689751 
training: adam 
Training set score: 6.946600 
Training set loss: 6.150527 
constant leaming-rate 


—— constant with momentum 
constant with Nesterov’s momentum 


inv-scaling learning-rate 
=== inv-scaling with momentum 





learning on dataset digits 
training: constant learning-rate 
Training set score: 6.956038 
Training set loss: 6.243802 
training: constant with momentum 
Training set score: 6.992766 
Training set loss: 6.041297 
training: constant with Nesterov’s momentum 
Training set score: 6.993879 
Training set loss: 8.042898 
training: inv-scaling learning-rate 
Training set score: 6.638843 
Training set loss: 1.855465 
training: inv-scaling with momentum 
Training set score: 6.912632 
Training set loss: 6.290584 
training: inv-scaling with Nesterov's momentum 
Training set score: 6.969293 
Training set loss: 6.318387 
training: adam 

Training set score: 6.991653 
Training set loss: 6.045934 


learning on dataset moons 

training: constant learning-rate 
Training set score: 6.850000 
Training set loss: @.341523 
training: constant with momentum 
Training set score: @.856600 
Training set loss: @.336188 
training: constant with Nesterov's momentum 
Training set score: @.85@6600 
Training set loss: 8.335919 
training: inv-scaling learning-rate 
Training set score: 6.560000 
Training set loss: @.689015 
training: inv-scaling with momentum 
Training set score: 6.830000 
Training set loss: @.512595 
training: inv-scaling with Nesterov's momentum 
Training set score: 6.330600 
Training set loss: @.513034 
training: adam 

Training set score: 6.930000 
Training set loss: @.170087 


inv-scaling with Nesterov's momentum 
— adam 


m(=1>) 10 (01-10 =1e)174eer-lalam\r-lelalialsmistoilelastome) axel (e|imeir-t-s-)hiler-lulele 


For greyscale image data where pixel values can be interpreted as degrees of blackness on a white background, 
TL «om arelaloWV/ahacs)ame|(e|imastecore|alid(e)arm ial=m ={-10e10)6||| Mm (o-10g(e1(-10 ml =e) |/4gar-lalamaar-(elallal-maalele(-1m@=1>1ea[e)0lllimt=)\pmer-lam el-larelaaa 
effective non-linear feature extraction. In order to learn good latent representations from a small dataset, we 
reVadualeit-Uinvane[>lalsie-Nksmaale)qomt-lel=)(-\emel-lt-me)’m Ol-1al0lae)lalemialmig-liallalemer-it-muuitem|lal=t-\en-Jallitome) mm om ©)).<>) mai =t>(e1@ ell a-(elile)ar 
a Balicm->.<-1a0) ©)(=mva[e))s's-m ale) san comm 010/10 mr-meir-F-s>)] fler-iile)am e)|el>)|[ar-muuilam- mi ={>1eel0) Olin t=1\\/ mist li0laom >> 4ie-lelle)ar-lalemre| 
LogisticRegression classifier. The hyperparameters of the entire model (learning rate, hidden layer size, 
regularization) were optimized by grid search, but the search is not reproduced here because of runtime 
constraints. Logistic regression on raw pixel values is presented for comparison. The example shows that the 
features extracted by the BernoulliRBM help improve the classification accuracy. 


Ieee re = iplbiqhony sss) iene) 

iheionac iheheolloisilstlesiowqoiler sis) (eile 

1G OIG  S@uL On paavelslikevels. ahierencis Wereighvaedl iS 

eigen Sicilicveeio, aaierore: Janice iereleihy, icleireysoes); iileibienics 
eigen S\<llicveueiohynlecia i -Stelisiers ore Fauieiar iige sy ISS sjoilane 
igen Slices giouneswh Voiere orel< abiis@rec Isyoaqinve WIL an ale 

from sklearn.pipeline import Pipeline 

from sklearn.base import clone 


+ OFT H fH tH OT HE tH EE HE EE EO HE EE EOE HE POE EEE EOE EEE EE EOE HE EO EE HE EOE EEE EE EEE EE EEE EEE EE EOE EOE EOE HE EE 
# Setting up 
Glo1E IMuickejS Cleieasisn (lS, 0) 3 

This produces a dataset 5 times bigger than the original one, 

lOve aiNenyiete, 1wigis. tep.ger abiierolso = alae O© velaeheel lex ihexq ire Ime Vieskelgue elena, W0lS) 


veo vy 


direction vectors = | 


pore oly, 
[O, O, QO], 
Oe Oe One 
[[O, O, O], 
[1, 0, QO], 
[O, O, O]], 
Cen Gh Oly 
[O, O, 1], 
DSly Ol Une 
[len Oy Oly 
[O, O, QO], 
[Oy ky Old) 


Glene (Slime pe 
return convolve (x.reshape((8, 8)), mode='"constant', weights=w) .ravel () 


X = np.concatenate([X] + 
Lalo neyo eulenig erie (Sitabee, i, <, Weeieore): 
Ole WIIG ONe ali Wl ie See Leg Velcon. ||), 

M6 = 1S sCOMeeleemeceS ([1¢ icer sli merce (S) |e eb<tS=O0) 


rele Wien 2, 


# Load Data 
Kp W = Cercesecs, lac cligiics (meriinn « Velicue) 
= igheweleveeirey (Oe  mlieshe SV), 
iy V0 = imbleleis: clelicelser kp WV) 
MS 10K = aojocmelas. (0) 7 alee 0) ae OSGI) GW ereciilaviarg 
Me TbieenLI; OC Kissing A (cielo, “Cesc = ‘iceulia ese siete) 
Kp We ESSE SLBe=—U.2, Memclem Stace=U) 
# Models we will use 
logisiewe = lLimeaz imeocel- Inogae cLOnecmesslom |SOlwec='Meweon cg), well), 
eo) = lisrenGulliln Nemiecumcloml Sten 0, wedoosie=i ens) 
Ai Weetunes Classi tier = Paes lime | 
Sicsos= (idem, mois (/ikeemsicley Jkecaicieie) |), 
Ha HH He ae ae ae He He ae ae He ae eae eae a ae aoe ae PAE a ae eae a ae He ae aE eae aE ea ea 
# Training 
# Hyper-parameters. These were set by cross-validation, 
# uSing a GridSearchCV. Here we are not performing cross-validation to 
# save time. 
elena, IbeevicmiLing meres = 0), (NS 
eloMi,i@ Lise = iL 
# More components tend to give better prediction performance, but larger 
# titting time 
clung Cemloodomcs = IL 
IOGiscle ge = G00 
# Training RBM-Logistic Pipeline 
lh ESEVCUIOSS (CLASSIS 5 IE UN eit, Eel) 
# Training the Logistic regression classifier directly on the pixel 
see [Otel lessee = Glens i logr sie ie) 
aa (Oil ClleissiitneieaG = IMQU. 
lel Jowell lle Siieiei 1ie (Ok Teil, % eieellial) 
ft OTTREEE THT HT EH ee Hee eo EE oe ESE EEE ESSE 
# Evaluation 
M POS] = Iloil WSereUKSS CLASSICS OWSCLLe (OX est) 
print (“hogistic regression using RBM features: \nzs\n" 3 { 
Me CIWS, CILASSLINVCA LOI Ie Orr (i See, i jemeecl) |), 


Me FOWSC) = teeny ford. CIS Sie ei so eSichuiete 0k “LSS 
print ("Logistic regression using raw pixel features:\n%s\n" % ( 
MMS EILCS -ClASSLELCALLOM SSCL TeSt, & jereScl) |) 
fOTTREEE THT HT HH oe ee eo EE oe EEE SEE EE aE 
# Plotting 
plt.figure(figsize=(4.2, 4)) 
Ihe ily, COMie Id Ei UMINSIsneS (Tesi COmMloCMemMes |) = 
foul g Srlloje Lore WL 5 IC, aL ar IL) 
Le. LIMBA (CCuIe. Cssiaeioe | (ep 8), Cmaps. cm. cies ie, 
interpolation='nearest') 
foullig 2 ices 1) 
Obes WCILOIS )), 
plt.suptitle('100 components extracted by RBM', fontsize=16) 
be, stloolocs cliches, Wee, Wo 2, Woda, Use, W235) 
elie Slaven (0) 


OUTPUT 


[Bernoulli REM|Slterativon ly psende—like lineedy —3-25.39, 
[BernoulliRBM] Iteration 2, pseudo-likelihood = -23.7/7, 
[BernoulliRBM] Iteration 3, pseudo-likelihood = -22.94, 
[BernoulliRBM] Iteration 4, pseudo-likelihood = -21.91, 
[Bernoulli RBM) Llteration 5, pseude—-likelinoed — —2Z1 69, 
[BernoulliRBM] Iteration 6, pseudo-likelihood = -21.06, 
[BernoulliRBM] Iteration 7, pseudo-likelihood = -20.89, 
[BernoulliRBM] Iteration 8, pseudo-likelihood = -20.64, 
[BernoulliRBM] Iteration 9, pseudo-likelihood = -20.36, 
[BernoulliRBM] Iteration 10, pseudo-likelihood = -20.09, 
Logistic regression using RBM features: 
precision recall fl-score SUpwome 
0 Oa gS crs: 0.99 Ly 
iL Sul OS ois On S8 184 
Z Os 0.96 OF9'5 Ione 
3 0.94 Ore’, Cle OZ 194 
4 Clg St cl, Ds Oe ME 186 
5 Oro O92 Ol. 32 IAL 
6 0.98 0.97 On Ba LOT 
7 O23 0.99 0.96 164 
3) Gis 3 Ores: One eZ 
) Once Ble, Gil 0.90 169 
ae ctlinae Oc eo 
MMeYCIaO) ehye| 0.94 0.94 0.94 Lay 
weighted avg 0.94 On o4 0.94 LA 
Logistic regression using raw pixel features: 
Jona -ronmcmnoyel recall fl-score SiLSIOOME 
0 0.90 Oe eZ OH Ly 
1 0.60 Ovare: On Oe 134 
Zz 0.76 Cleteks Oro.) Iesle 
3 GLa We: Ole FY OA TEs 194 
4 Os el 0.84 ORs 186 
5 Oe ant On 16 OFAEG peal 
6 O). Qu Or ce 0.89 70g 
7 Oates 0.88 Oey Lod 
ts) Ge Gal OL pere: O62 eZ 
) Olea O16 OR is ions, 
ae Clie @ 7 Gh. hs ES 
Mee Omer 7 C) Oks hs Gly Tes OFes Ley 
weighted avg Cras Oe Tis Ge das A 


time = 0.18s 
time = 0.32s 
time = 0.30s 
time = 0.33s 
time = 0.31s 
time = 0.30s 
time = 0.31s 
time = 0.31s 
time = 0.29s 
time = 0.33s 
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The digits dataset consists of 8x8 pixel images of digits. The images attribute of the dataset stores 8x8 arrays of 
grayscale values for each image. We will use these arrays to visualize the first 4 images. The target attribute of 
the dataset stores the digit each image represents and this is included in the title of the 4 plots below. 


Note: if we were working from image files (e.g., png files), we would load them using matplotlib.pyplot.imread. 


# Standard scientific Python imports 
IOC merceloiclue jen eS wlic 


# Import datasets, classifiers and performance metrics 
from sklearn import datasets, svm, metrics 
rioom Skilecamoclel SelecewOMm WSO: miella mess sjollic 


# Digits dataset 
CHUG LIES = ClercaiSioes 4 ele. Chugaics ||) 


_p ese = plr-sulealecs\arcme=_, acols=!, LugsnZe=| ll, 3) 

for ax, image, label in zip(axes, digits.images, digits.target): 
ax oS CXS Oleic |) 
2X, LIShoOW | ilaAgS, Gliep=olt cil, Guay i, Ii cepOlacLon="Meates: | 
Axo siete ticle radii? oil’ ~ lees) 


QUTPUT 


mcreelelaly4laremarclalenniUalicciamel(e]icwmeatetswsiilercliveye 


To apply a classifier on this data, we need to flatten the images, turning each 2-D array of grayscale values from 
shape (8, 8) into shape (64,). Subsequently, the entire dataset will be of shape (n_samples, n_features), where 
n_samples is the number of images and n_features is the total number of pixels in each image. 


We can then split the data into train and test subsets and fit a support vector classifier on the train samples. The 
fitted classifier can subsequently be used to predict the value of the digit for the samples in the test subset. 


i? leclgeleieoe Greubeige ke He: 2 Aulieude sliee. ces 

moore, Merce clio oVWoler eis jollte 

# Import datasets, classifiers and performance metrics 
from sklearn import datasets, svm, metrics 


FeO, Sx il@erei cmc | Seleciciein Wimjoowe Iie Win cesic so lac 


# flatten the images 

i” Satjoles = lew icligics . images ) 

Cleice, = ChiCALiCS yIMeciSs 5m sides | (in Seuteles, Il) ) 

# Create a classifier: a support vector classifier 

clf = svm.SVC (gamma=0.001) 

# Split data into 50% train and 50% test subsets 

Keil, OS USS, WF Ibieaiil, Vf eSie = civellin eet Siolliic | 
CHE, CUGLES.Tegist, West sivessUeo, sliurrie= rales) 

# Learn the digits on the train subset 

GLI SIRO S lesen, ie ier eetto\) 

# Predict the value of the digit on the test subset 

lormeelioicec = Clit jormechice (0. Test) 


Ly exes = oOltssuce loce (aecwie=|, meols=4,, Tigsime= (ley, 2) ) 
IEGNe eb AilelOfSly, IOSICMLGIE MeN Til Anjo\erKeSh. OS Sse, leche ricch| s 
SDkG SSE EDL Orie |) 
image = image.reshape(8, 8) 
Sco Limon) (LM CS), Clea Lie eilscuceny i, sides OO Lec lei= Moe wesc 7 ) 
erecta, Teme Ket eSchiGie Holme [eresielicie on) Y ) 


(Queakiant ue MOrleksyone eaietsle mei dejeionecs ene Welleisculievere Celine ay 
1 PIS Ie LOS - CLES SIE NGeNE LO FeSO (7 ESS cy, predicted) }\n") 


QUTPUT 


CLASSE MOCSE LOM ieeioomm wor Clagssieiec SVCIC=.U, bree Cies=Malec, Gace S176=200- 
CLASS WeElLciac=Nome, Cost =0 .10, 

(SOLeGLGM EWING LOM Sliepe="Ovr!, cClgmes=s, Ceninie=—U,00I, kere’ ier” . 

Wes. Miesic=— 1, “orecloaloilaiyvaPales, mamclcil Staite=NOne, Sinica cing =ilicus 

tol=0.001, verbose=False): 


johacronmomneye recall fl-score SU Ojoeue 1c 

0 th s0ke Cro Cro oh 

iL OC. 2S Os oy Cha IS ok 

Z Oa gS Oa Os On gS 86 

8 Oa Be OST Oke SZ val 

4 Ore O79. Oren OZ 

5 Ons Cay OF 96 Sill 

6 0.99 Oh SNS, Oe oS gull 

7 Oe NS Oa OS. On ey 89 

re) 0.94 LONG Ore Shi) 88 

S. Cos Omos Oe NS OZ 

Ae @iisaey) Ole BT ayes, 
WNC) ENC) Ore Oey ORs oe 
weighted avg Oh) Os Oro" Boo 


mi=1er0le al y4ialemarcvale mi alicssame|(e|iccpm@xe)avielsile)amm ante lian. 


VA’ Ce Mors lar- lts(omm 0) (0) mr- mere) aliuls}(e)amant- lip @e lim (alo mige(omel(@liMmic-ll0(otomr-lalemialom e)a-lellei(-remel(e lime llO[otoe 


i PB CUOCeueel iste ke iw alii e! IP New leven shaereaieS 

MMINOIe Wen lOclileaewjeloie els jolie 

# Import datasets, classifiers and performance metrics 
from sklearn import datasets, svm, metrics 

nem Siler mioclel Ere iSer Oi Wmioimie ice Sse Sole 


# flatten the images 

i Semoles = ler VelLcuuss « limecies | 

dace = Clues. lilacs .eesiieos ( (i Serloles, =) ) 

# Create a classifier: a support vector classifier 

clf = svm.SVC (gamma=0.001) 

# Split data into 50% train and 50% test subsets 

Kee Til, UX WES, Af bieelliN, Jesse = eteatnl osc siellmic( 
Glevcel, CULGINES ,cCelecSit, ISS GiZe=U.5, slovucrile=talse) 

# Learn the digits on the train subset 

GAME GIT VK Eee id, 7 Eel) 

# Predict the value of the digit on the test subset 

ceric = iCllin.jeieScuLeir (6 ces) 


dLS¢ = MScr1SS lor CCmreusLom Titec (Clit, x SS, VW est) 
CLS ewes pStioe miele | Comme tom Meiciciy< ) 

gg mapas ae MMeUE 12192 gly) CLSIS! wOXOME MS Toil Mie verct<) Y) 

Odlis Sian 


OUTPUT 
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Shows the effect of collinearity in the coefficients of an estimator. Ridge Regression is the estimator used in this 
example. Each color represents a different feature of the coefficient vector, and this is displayed as a function of 
the regularization parameter. This example also shows the usefulness of applying Ridge regression to highly ill- 
(oxo) alo liife)al=Yom aarchia(ecsssram me) mecie(era meats lis(exstcemre sy i(¢|almenar-lale[-Mlamialcm (-lce[-]MUcclal-le)icmer-lamer-lUls\omalele(“Mm\cclarclalersiom lamials 
calculated weights. In such cases, it is useful to set a certain regularization (alpha) to reduce this variation (noise). 
iVat=tae=\|6)arsMismAo1AVmtclfe[omcalomcclelUl(clarcelile)amoviiceimere)aaliarclictcmialsmcve[0r-1co(em (estou 0] aceii(e)amr-lalem tal-merel-)ii(ealesmicyale 
to zero. At the end of the path, as alpha tends toward zero and the solution tends towards the ordinary least 
squares, coefficients exhibit big oscillations. In practise it is necessary to tune alpha in such a way that a balance 
is maintained between both. 


iMjoouere, Muley eis ia 
WMeCwe wilkeisjelkerellilesjexqellow sels jolie 


ricci Secon muro; ates alorcksS 


X is the 10x10 Hilbert matrix 
i, 4 didoperceince (il, IL) sp wiosgekecicts 0, i100 Le, loners |) 


= np.ones (10) 


ie ae a ae A ae A AE A ae a ae a aE a AE a aE aa aE ae aE aE ae ae aaa eee ae ae ae aaa 


Compute paths Ridge coefficients as a function of the regularization 
—ellphas = 200 
Aljelisis = ios LOgiocice (LU, =2, im eulboleers) 


2 ee rr 
| 


coefs = [] 
ele i) alin elle oleeise 
mice = limes mocel INicee (algo, ie Miniewiceoe=—i alse) 
Te Taelele A eikes Ga 7) 
SCSIES . ajc(oieiiiel eerie. COS) 
# FTTH EE HHH HTH HEE HE HH TOE HOE HE HOOP OSES EE SESE SE ESE 
# Display results 





alpha 


ax = pltsqca() 
aca |OlLOie (eillolaers,, Corus) 
epi nOele. -<@enlel( Ice.) 
ec, Ser Sclaim(eve cies Selim) eee) je wewersS eos 
Die scrabe lee a Volta.) 
plt.ylabel ('weights') 
plt.title('Ridge coefficients as a function of the regularization') 
alee coal ate le avian ue) 
show () 


OUTPUT 


Bayesian Ridge Regression 


Computes a Bayesian Ridge Regression on a synthetic dataset. See Bayesian Ridge Regression for more 
information on the regressor. Compared to the OLS (ordinary least squares) estimator, the coefficient weights are 
Slightly shifted toward zeros, which stabilises them. As the prior on the weights is a Gaussian prior, the histogram 
of the estimated weights is Gaussian. The estimation of the model is done by iteratively maximizing the marginal 
log-likelihood of the observations. We also plot predictions and uncertainties for Bayesian Ridge Regression for 
fo) at= Mol laat=lalsi(earclm acre |dctsts1(e) ame lsy ale OXe)NValeanltclmicr-li0]om=).40r-laly(e] ae N(O)(om Lalo mulaler-ar-l/alavmecir-laccme le) [ale mo] omelamuarcmarelalt 
side of the plot. This is because these test samples are outside of the range of the training samples. 


IMO Cua IolwMNON wis Iie 

MOG WhelejoullLoOniillejenjole@c es toll 

LieCMl SOO Wuyi Scecs 

rico Si lksewen Lijeeve Tne, Aor Senyoset RCS, Inlinicelin erases On 


# PTET EH TH HH EE EHH TH HOE EE HH EH OH OSE EE OH SEE SOE EE SESE EEE SESE EEE SEES 
# Generating simulated data with Gaussian weights 
eh owmarchelero)iimest-1-1om CO) 
‘o, Semtoles, it ieetcwicSss = 100, IL 
IMO niecuNcon vacuole freiMoles, a 1eScle vices | # Create Gaussian data 
i Gugietclicia) Veukelmies: yualielel yo) jougicyerLeakeye, Ju=i@leyelsy (ee 74) 
lambda = 4. 
Wi = 194 SEO S Wal ae Svelie We (SNs) | 
# Only keep 10 weights of interest 
Mo leWeIte Kesuros = Wop cancion, cainehime (0, Mm ISaisvias, 0) 
ie IL abi welleweilic Sere Mice S & 
Wiha] = Secs nem, ors | loc=0, scale=l. / Mo, Steere (Wenlock) | 
# Create noise with a precision alpha of 50. 
e\ljelte, = Sil. 
MOLES — Steee Mommie oe, Seele-l, | mo,scucialoba |}, Size—m Ssenioles) 
# Create the target 
y = np.dot(X, w) + noise 
# PFET EH HE HH HE HT HH TH HOE OE EH TH OH OF EE OO OEE SOE SEE EE SESE EEE OEE SEEPS SESE EEE EEE 
# Fit the Bayesian Ridge Regression and an OLS for comparison 
Clie = ReySoSlemin cle VOOmloleS SCC me= lice) 
@allbtayeeinsle ee ey) 
ols = LinearRegression() 
OuLS) IEILIE Oh, 7) 
# PFET ETH HH EE EH HHH OE EEE HT EH OH OSE EE OO OE SHOE SEE ESSE EEE HESS EES SESE ESSE SEE EE 
# Plot true weights, estimated weights, histogram of the weights, and 
# predictions with standard deviations 
lw = 2 


Plt. b1oure (rngsilze— (6,9) ) 

plt.title("Weights of the model") 

ube jollori (eliu Gost , cole" ligiccgmecc’, liineiwachela—liy, 
label="Bayesian Ridge estimate") 

plt.plot(w, color='gold', linewidth=lw, label="Ground truth") 

lode sjoullore (ells ecer , COlok= View’, Litesiyle='=—=", laloe l= Ole ee riiece”) 

plt.xlabel ("Features") 

plt.ylabel ("Values of the weights") 

plt.legend(loc="best", prop=dict (size=12) ) 

plt.figure(figsize=(6, 5)) 

plt.title("Histogram of the weights") 

(ube sla Sie ellie Goer . loilas=sm Seems, Color" cele! , logeilicue, 
edgecolor='black') 

iceseeicinoe Welln eee  ieSlleweinc TSeicwieSs ||» Ilo sir bNLIk (Vem we Swenic ieelenc Weel, Se) y, 

color='navy', label="Relevant features") 

plt.ylabel ("Features") 

plt.xlabel ("Values of the weights") 

plt.legend(loc=—"upper lert”) 

plt.figure(figsize=(6, 5)) 

plt.title("Marginal log-likelihood") 

jollte sjollore (Cll Scores , Cole (ieawy’, iaewactelim ln, 

plt.ylabel ("Score") 

plt.xlabel (“lterarions") 


# Plotting some predictions for polynomial regression 
Clete I ib<; MOSS emo MIEN) & 

= Mo SOTaE (se) sojer, Sahat oc) 

noise = np.random.normal(0, 1, len(x)) 


iScUe Va? Mees cliNotinic ~ ImMOiLSie 


degree = 10 
i ies alsyocree (05 IC 100) 
Vy = FIX, MOlsie elnemiae=—0) . IL) 
Gillc poly = Bayes Lamas |) 
Gllic joolly. 2a (mj vemicleim (6, GlegKee), VW) 
x folio = mo. lunsoecs (0, Il, 2s) 
WV felole = 11x SLO, MOSS eiiowmc—0)) 
Vo ileal, WW Sixcl = Cllr SO jomecner (mo. vemcec ( ello, Clogrse), ecu soc iicie) 
plt.figure(figsize=(6, 5)) 
IIe SeeCoee WK joleic, x) Mikeieligng No Sic, Wie lOieS jain, 
label="Polynomial Bayesian Ridge Regression", linewidth=l1w) 

feulbie sJoUuLGIn 2k Oleic, iy Toler, Colom= gole’, Limermucod=lm, 

label="Ground Truth") 
pits, Vabels("Ourour sy.) 
plt.xlabel ("Feature X") 
plt.legend(loc="lower left") 

show () 
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Linear Regression 


The example below uses only the first feature of the diabetes dataset, in order to illustrate the data points within 
datommas' come} aal=)arsi(e)atclm ©)(e) mmm Mal>msiie-l(e|almilalcmers lam elomc\=\-)a ml amialom ©)(@)mm-jare)’\)/1alemale))\\mllalsrol am acie|qstst<) (0) am-1hK=100) O)komce) 
draw a straight line that will best minimize the residual sum of squares between the observed responses in the 
dataset, and the responses predicted by the linear approximation. The coefficients, residual sum of Squares and 
idalsmecel=1pi(el(s1alme)me(oi(=)anallalciule)amslacm-lsvemer-l(eiuit-I(slem 


aoe miei Ojon ave nonjollone eis ollie 
MONG IMM", sks Ayo 
cio Sii@ewsa Winsome: CleiceSecs, Jiteere iiocks 


cel) SILOS Gcles qMmOCee, Whceial ScMeiccel ico. 162 SIsoNgS 


# Load the diabetes dataset 
SIEISS CSS <<, ChLealSees vy = Clalesecs. load. Cielosires (icscuicn 3 VWeicue ) 
# Use only one feature 

SiLalseces < = Cheoeces ls, MmocimMewiex Ss, 72 | 

# Split the data into training/testing sets 
CLaSScesS XM cies = Culeloeres <|b=4U] 
CuLalOeces < cies = Clieldaces X<[-20 8 | 

# Split the targets into training/testing sets 
CIASS Ces VW icea = ClLeoS ces Wi e210] 
SIASSESS Vf LSS et = ChleloSeres VleZ20e | 

# Create linear regression object 

cece = linear mec |. bimeGEeidInec ress GM ( } 

# Train the model using the training sets 

TSC 6g EI ((OULGUS SSS OK Tete, ULSI) VE Ie ets 19 ) 

# Make predictions using the testing set 
CIES CSS Vy TOIGSCl = IScie .joimeciLer (Ci elerSs Xk USE ct) 





# The coefficients 
Dee COc mite Leming -an ii, neOiane Oc mm) 
# The mean squared error 
print('Mean squared error: %.2f' 

© iWieeia SCieieecl Sicimom (Cielseues (7 ESS, cCllalosces Vo jomec!) ) 
# The coefficient of determination: 1 is perfect prediction 
Druin COSCtiReLenMeCOr Cekerilni nak HOnime. .— f - 


[e) 


6 £7 SCOmS (OMENS TSs (" TSS, Clielssies WF jeieSscl) ) 


# Plot outputs 
ISIE A SCelceSie (CligigSetes << ESSt, Cilelstes VV Gest, Colom’ olleclk” } 
URE sOILONE (ClLEISSTSOS J TSSt, ChlLeloSteSs 7 joiScl, COLOm="olue’, Iiineiwa.cielt= Ss), 
Omit ode @ lesa(aic)e) 
jou Meg Wiese las (i )).)) 
AS lave (©) 


[oukie 
Coefficients: 

C) 0) a Pp U TY [938 .23786125 | 
Mean squared error: 2548.07 
Coefficient of determination: 6.47 


ovata alsrereihViom(oroks) msve [Urol gots 


Tams talism=>.¢- 100] e)(s Pans Milero Mil alst>lamanlece(=ymu liam elessi|(h-Merelarsiie-l|aitcme) am alsa aore|aotoryle) amerel=iii(e1(>) al tsur-l ale mere)an) ey-lae-m lal 
estimated coefficients to a classic linear regression. 


import numpy as np 
IIe G. We) wie wow lie jon 4elene vers: jell 
ISON: Sy Sevieiymie ie wers: OTE Tes Selects 


mem SKRILSeie smock |. SS lec oi GNOME Tee tid “USS ec. siolaic 


np.random. seed (42) 
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ii 
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i Semigles, Gl ieQeremiece, — A, 20 
VC MNO 5 NCCI. eeunNeN OL SeliNoWes,. 1) IsSeeUeos ) 
IE ees TSOISnE a Tole 4 eeuorelonmls ieevorcha (lL Ieee Mia sis ) 


# Threshold coefficients to render them non-negative 


euS Coir leeuS coer < Ul = © 


NP = ingle, wise eOmie) 2 4 6 
a OLS regression coefficients 





ia ENGIOL fSxouira) Iu eye 
= 2 * MODs Melos corel, (Swe ic Semioles, ) | 


K ERs J SS, WP ICieeia, Sf ESS = iret best SoluaiS, We esi Si we=0 45) 


SC) IGS = In otSeeINSic i SiS SivOil |), 

Ne Jorsee) Wigs = ec inl s ReIOXk elec, 7 Ie iéelalin)) .]OigeSClLCe (OX IsSsic ) 
14 SOONG ills, = 12 Corea Seis, 17 jomeciel igual) 

(Siemicve (CMINININS INA GIOOIES’, TA SsCOieS malls ) 


ieee Ols = Il Meer RSCimess Oil |) 

VW iskec Ole = Mec OlSehive (UX Teeetm, WORE) sec’ (0k esc) 
1/4 (SICONS: IOS = Tee SeOme ry TESS, WwW ISictecl ols) 

Digtcne (MOMS IA EGOS 14 SCOrS Os) 


mikes “ee == jolie sileje Nees), 
aps olor (ses Olsccosk , weg mols scoor - Aioenenclol=0 7, jie Isoie—n!) 


Gui Oe Meld <= ex 5 Ge Oxllan| 

lor Sep lei v= exsGeir sili) 

low = max(low_ x, low y) 

Ina hoy = faa (els Oe Iolo 7) 

ax OlOt Gilow singh, low ati oh aS — "es eo ona. ©) 

eves See x elgg “Olns, mise socio. GOS Mehe es Teorbiis melnie— lore ol, 
eps o Sele \eleigt ik (INININS elie TiS 1 On (eCia iR Me TNE +A IRE NSoLeliie— oe llely } 


NNLS R2 score @.7436926291/00348 
OU ‘IL Pp I OLS R2 score @.7436926291/003438 


Text(®@, 8.5, “NNLS regression coefficients’) 


mil0l0 sim llalsr-lauaalele(=) mors) lleat-iile)@mels)ale mm mvAU Ne yale 


Tamialtsm=>.¢-l9a] e)(=mu=siolo ale) al (om ge) 0)Ur=)I\Vani | aro [lalst= lam aalelo(=1m (omr-l0l in ae fcr Mmersyl ale mialommVAU\ toya\Om-llelolaiialaan 


eo Whew olicelaule. liq@ecuec. joxsolioe, eis jolie 
Icom Shakeel Avior s Ikrinseie wioclsl,, Ceeesiocs 


i, Semmolics = ICON 


ih CLE ILS = 5 


Kp Wp COG = CelaSScs ilexe Mogmessioa (ia Seimales=i_ samcles, mM teacures= 1, 
i TIMI OIMee I Ve= i, iosl SS 1110! - 
SOS eS Wiese, esinclom, sireicie—0) ) 


# Add outlier data 
np.random. seed (0) 
| Sit Ou ibers | = 2 te U4 Moc iceiacloml. inoicme IL (si Zie= (0 CuUcIiisces, i) 
SAL Eig ules | SS Se ae IL) = ide oie euarclom. ioe ceil (sulwe—ia eile Iliueie's) ) 

# Fit line using all data 

lie = ILimeee Moca | «ime eidves¢ ress Lon, (|, 

lire Steet (ae) 

# Robustly fit linear model with RANSAC algorithm 
meisere = Jide WNC IL. kN SANG Necieas Sie |) 

las Ngks Cle. iG Lie (0. V0) 


ye 
Wi 
— 
Ca 
cL 
ui 
ii 
oe 


MdLaeie Mes = iccioseres Mid lieie mess — 


ove dhaksie iilersi< = myos Ib@epneellk sore. (ail akeue sven) |<), 100 
ime wy, a ac - ——— Linear regressor 
# Predict data of estimated models a RANSAC regressor 


lime X5= np-arange (X_minm())) X max ())\il2, mp.newaxis] —200 Inliers 
= Outliers 
line y = lr.predict (line X) = 


Laine Vy Tee SelC = TeeliaSeye «(iSonic (| Lisa 6) Input 





# Compare estimated coefficients 
print ("Estimated coefficients (true, linear regression, RANSAC):") 
(SiesMeWCOee, IiescOSm -, ieeciiSaSseSsicamerccue sCOem | 


lw = 2 

Sie ctcee eSie (OK | aigihiere iierei< || > sr |ltadhieie meres |) 7 Colour 7S labemigiseciny » iene isis" 7 
label='Inliers'") 

les SGec2er (UX (OUCII ei illasix|,; “/loUcliee mesix||, Color “Golce’ , lech eie=! 5 "5 


label='"Outliers'") 
ile siddhore (Ike <4 Iie We Colior="Hemry' . dioSiiiclielkia lit, Ielooi=' icuitSeie ISeiesie cole’), 
Give siollone (atin Ox. die Wo melita, Collom= "corm Loweidolie), lhiinewaicheli= ily, 
label="RANSAC regressor") 
jon ur are l—selom @ melo mm me) (-sammanke lene) 
joule -cllsiersuik (OY ibajomhe )) 
plt.ylabel ("Response") 
Die Sine we 


OU a p | | ah Estimated coefficients (true, linear regression, RANSAC): 
82.1963908407869 [54.17236387] [82.08533159] 


Regularization path of L1- Logistic Regression 
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are exactly 0. When regularization gets progressively looser, coefficients can get non-zero values one after the 
other. Here we choose the liblinear solver because it can efficiently optimize for the Logistic Regression loss with 
a non-smooth, sparsity inducing I1 penalty. Also note that we set a low value for the tolerance to make sure that 
the model has converged before collecting the coefficients. We also uSe warm_start=True which means that the 
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from time import time 
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# Demo path functions 
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three One-vs-Rest (OVR) classifiers are represented by the dashed lines. 


import numpy as np 
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# print the training scores 
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# create a mesh to plot in 

h= .02 # step size in the mesh 
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Z = Z4.reshape (xx.shape) 

plt.figure () 


OLE eCOm SeenON 4 euicye—ellic wel lecilicec)) 
[oukiS se Cae e Weems Or IOC IS EIS eesist.om (les) ea imei elles 
joie neo ciben( y ieabe dae 4) 
# Plot also the training points 
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# Plot the three one-against-all classifiers 
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ymin, ymax = plt.ylim() 
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def line(x0): 
return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1] 
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Comparison of multinomial logistic L1 vs one-versus-rest L1 logistic regression to classify documents from the 
newgroups20 dataset. Multinomial logistic regression yields more accurate results and is faster to train on the 
larger scale dataset. Here we use the |1 sparsity that trims the weights of not informative features to zero. This Is 
good if the goal is to extract the strongly discriminative vocabulary of each class. If the goal is to get the best 
predictive accuracy, it is better to use the non sparsity-inducing |2 penalty instead. A more traditional (and 
possibly better) way to predict on a sparse subset of input features would be to use univariate feature selection 
followed by a traditional (!2-penalised) logistic regression model. 


import timeit 

import warnings 
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from sklearn.exceptions import ConvergenceWarning 


warnings.filterwarnings ("ignore", category=ConvergenceWarning, 
module="sklearn") 

ell = eiinoice picleccelul lig eae |) 

# We use SAGA solver 

solver = 'saga' 

# Turn down for faster run time 
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models == —{'evr'= name: "One versus Rest |) Siters': [155 2, 41}. 
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for model in models: 


# Add initial chance-level values for plotting purpose 


accuracies = [1 / n classes] 
times = [0] 

densities = [1] 
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# Small number of epochs for fast runtime 
FO Tlie Tile iieeSie aig wiloclslL joercenms | Tiers” | 
print ('[model=%s, solver=%s] Number of epochs: %s' % 
Mees Teicenis || iene’ |p StOlevoie, (Emi Were eee) ) 
Ir = LogisticRegression (solver=solver, 
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accuracy = np.sum(y pred == y test) / y test.shape[0] 
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accuracies.append (accuracy) 
densities.append(density) 
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models[model]['times'] = times 
models[model]['densities'] = densities 
models[model] ['accuracies'] = accuracies 
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fig = plt.figure () 
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for model in models: 


name = models[model]['name' ] 
times = models[model]['times'"] 
accuracies = models[model]['accuracies'] 


ax.plot(times, accuracies, marker='o', 
label='Model: «s' % name) 
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Downloading 2Onews dataset. This may take a few minutes. 

Wowie Loaciing ckeicasei® iirom htips://ndownloader.figshare.com/files/59/5967 (14 ils) 

Automatically created module for IPython interactive environment 

DEcaSec ZUMENSeicomo, ica lin Semiolses= 000, mM ieecuices= Lawl, wm Wilessce—70 
[model=One versus Rest, solver=saga] Number of epochs: 1 

[model=One versus Rest, solver=saga] Number of epochs: 2 

[model=One versus Rest, solver=saga| Number of epochs: 4 
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Run time (4 epochs) for model ovr:4.02 

[model=Multinomial, solver=saga] Number of epochs: 1 
[model=Multinomial, solver=saga] Number of epochs: 3 
[model=Multinomial, solver=saga] Number of epochs: 7 
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6 non-zero coefficients for model multinomial, per class: 
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Run time (7 epochs) for model multinomial:3.40 
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Sparsity Example: Fitting only features 1 and 2 


Features 1 and 2 of the diabetes-dataset are fitted and plotted below. It illustrates that although feature 2 has a 
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# Plot the figure 
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# Generate the three different figures from different views 
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This example illustrates that GPR with a sum-kernel including a WhiteKernel can estimate the noise level of data. 
An illustration of the log-marginal-likelinood (LML) landscape shows that there exist two local maxima of LML. The 
first corresponds to a model with a high noise level and a large length scale, which explains all variations in the 
data by noise. The second one has a smaller noise level and shorter length scale, which explains most of the 
variation by the noise-free functional relationship. The second model has a higher likelinood; however, depending 
on the initial value for the hyperparameters, the gradient-based optimization might also converge to the high-noise 
solution. It is thus important to repeat the optimization several times for different initializations. 


import numpy as np 

from matplotlib import pyplot as plt 

from matplotlib.colors import LogNorm 
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rng = np.random.RandomState (0) 
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# First run 

plt. figure () 

kernel = lf. 0)* RBFilengeh scale=100.0)) lengunescale bounds=(le-2,) le3) is \ 
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gp = GaussianProcessRegressor (kernel=kernel, 
alpha=0.0).f1t(X, y) 
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alpha=0.5, color='k') 
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plt.scatter(X[:, 0], y, c='r', s=50, zorder=10, edgecolors=(0, 0, Q)) 
plt.title("Initial: %s\nOptimum: %s\nLog-Marginal-Likelihood: %s" 
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# Second run 

plt.figure () 
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gp = GaussianProcessRegressor (kernel=kernel, 
alpha=0.0).f£1t(X, y) 
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alpha=0.5, color='k') 
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plt.scatter(X[:, 0], y, c='r', s=50, zorder=10, edgecolors=(0, 0, Q)) 
plt.title("Initial: Ss\nOptimum: %s\nLog-Marginal-Likelihood: %s" 
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# Plot LML landscape 

Olle... emi |) 

theta0Q = np.logspace(-2, 3, 49) 

thetal = np.logspace(-2, 0, 50) 

ThetaQ, Thetal = npo.meshgrid(thetaQ0, thetal) 

LML = [[gp.log marginal likelihood(np.log([0.36, ThetaO[i, 3], Thetal[i, j]])) 


for 1 in range(Theta0.shape[0])] for j in range(ThetaQ.shape[1]) ] 
LML = np.array(LML) .T 
vmin, vmax = (-LML).min(), (-LML) .max() 
vmax = 50 


level = np.around(np.logspace(np.logl0(vmin), npo.logl0(vmax), 50), decimals=1) 
plt.contour(ThetaQ, Thetal, -LML, 
levels=level, norm=LogNorm(vmin=vmin, vmax=vmax) ) 
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plt.xlabel ("Length-scale") 
plt.ylabel ("Noise-level") 
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plt.tight layout () 
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Comparison of kernel ridge and Gaussian process regression 


Both kernel ridge regression (KRR) and Gaussian process regression (GPR) learn a target function by employing 
internally the “kernel trick”. KRR learns a linear function in the space induced by the respective kernel which 
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on the mean-squared error loss with ridge regularization. GPR uses the kernel to define the covariance of a prior 
distribution over the target functions and uses the observed training data to define a likelinood function. Based on 
Bayes theorem, a (Gaussian) posterior distribution over target functions is defined, whose mean is used for 
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rng = np.random.RandomState (0) 
# Generate sample data 
ye kan Tegiepacechqrel 0rd) 
y = np.sin(X).ravel () 
y t= 3 * (0.5 - rng.rand(X.shape[0])) # add noise 
# Fit KernelRidge with parameter selection based on 5-fold cross validation 
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stime = time.time() 
Me gibt eke we) 
pera el) ime wn Ore MRE Sc dieting eo. Or.) cau eIMicn: ime) se Sieakie |.) 
gp kernel = ExpSineSquared(1.0, 5.0, periodicity bounds=(le-2, lel)) \ 
+ WhiteKernel (le-1) 
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# Predict using kernel ridge 
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stime = time.time() 
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print ("Time for KRR prediction: %.3f£" % (time.time() - stime) ) 


stime = time.time() 
Ve CSI = Cine soe SOLeig UX JoLOt, ieoruien scc=re les) 
[Dee Tea | : anes % (time.time() - stime) ) 
stime = time.time() 
WV Cio, Ww eecl = cio cmeocier (x jlet, meric Sec liruS), 
[Sus alialie 4 ae | 

% (time.time() - stime) ) 
(oulic, 4 ale bles) (iake sabe (INS Sys) 
lw = 2 
plt.scatter(X, y, c='K', label='"data') 
It elo (OX Jollee, me. Sim( x jolkoc), Collox="“wary!, Jbn=li, Jhelos l= 
Hhiggi@lkcie ex jolkon, sy Iie, jelkGic= ‘ie wieciilontce Jilin, 

Ieee = IRI © KigelSosie, jSeVceis | 
PiGe Plot ee lor mye Oh, ee lor der nge', lw=lw, 
label=' 6 Cle keicasi. | 
Pleo IA, IeSiemssin (ik Sloe ls. Wie Ww Gjom = 3 Gecl, VY Cie a 7 Sool 
aS nel —0ezs) 
plt.xlabel ('da ) 
plt.ylabel ('target') 
jolbier qlee e200) 
ple. y lim(—4 734) 
ple tela, rsu ) 
ea best",  scatterpoints=1, prop={' Sain) 
16g SNOvOuy || 

Maas: ieee NIRS ie seyeukignens) 4 o0Ns 
Time for ae iba caline 2 W. lik 
Wakes SCie KIRIN joieiclvoicacin® Og) 2 
Jaq nena 1Ed=s,) @uat-velaeeaengis 100910) 2)0. 
Time for GPR prediction with standard-deviation: 0.084 


ae UE 


ERA (f’alpha’: 0.001. “‘kermel: ExpSineSquared 


GPR (ExoSineSquaredilength scale=1.53, periodicity= 


Hlength sc 


exon Rehan! or el, 


ale=4.64, periodicity=12.9)})} 
6.15) + WhiteRemelinose level=0.699}) 





Probabilistic predictions with Gaussian process classification (GPC) 


This example illustrates the predicted probability of GPC for an RBF kernel with different choices of the 
hyperparameters. The first figure shows the predicted probability of GPC with arbitrarily chosen hyperparameters 
and with the hyperparameters corresponding to the maximum log-marginal-likelinood (LML). While the 
hyperparameters chosen by optimizing LML have a considerable larger LML, they perform slightly worse 
according to the log-loss on test data. The figure shows that this is because they exhibit a steep change of the 
class probabilities at the class boundaries (which is good) but have predicted probabilities close to 0.5 far away 
from the class boundaries (which is bad) This undesirable effect is caused by the Laplace approximation used 
internally by GPC. 


Heer are, iMLluonyy acl, lave 

com memoloclils meen jovollore eis jell 

Eee), SllSeve camer es IWMOCme. eroObiecey sscote, loc Loss 

EI SUM Seo CIS Ie [SIMOSISS Misi IGeUSS Mele MOCISSIC less 1 Ie et 
EeOM SIeiSeyen eee c iso COIS S «ids emis wimeere ivsid 


# Generate data 
meen, SAS = SU 
rng = np.random.RandomState (0) 
yo = else SieliLieoiam(e Op LOKO)) [es ino ieimeDxiLS | 
Veen aieizeny (Cl, a0 |e eo eecle Oe — iain.) 
# Specify Gaussian Processes with fixed and optimized hyperparameters 
Gj iL = Geues ene rocossC lass iitienn (Mermich=i.,0 “ Nei lemeic, Sieeileai (l) 
optimizer=None) 

OO. een IEC | Sere chiLigt iilrAS | 2 | Rie eet tial y=) || | 
Gio OS = Gauss tenProcesSsC lassiiinem | smimsl=l.0 ~ Isl (lei see bei. 0) ) 
Sie CISC IVES Steekhiig Sve. Vleet Swe) 
[She mened wife Wisvietopkiare bo Wel cotbangiereel «Ginga abelly ho coe é 

6 (IO ike Og mere aiiell Jil meecl (ei tas lSiaael elec!) 
jeueakslie (+ Ibfere; Mlstieropaljovetih: Uti eesikakigieverel i teye reali vac!) a 5 ean © 

© 1)o Ojos log imeregaimel, Lalwellaoecl(Gie Cries serie. clieinel)! | 
(Siena VNC UNO ss ose (idee) —s2 se (lejore mul etch) 

S (eeCUiedey scoms (yy |S ciel SiS i|, 19 TIox omeocwer (0K Sica lin, Sas |) | 


eeu Mey SCOR (VY | Siren Sinweil, Cle Cle so esomer (UN tien sims |) jy) 
(Sieiione (Weep ikecrms = oct sid alie ei) oe Sue dejo mal ziexel) 
6 (hog Nesey | sicmeni Sime, Cio wi esjoeScller joel eK |e Sie) le, Li, 
heigy Iho (QL ete ee in SAS | tele) Ce OmSeuLiee joo WC) secs suas) \lee hy 


# Plot posteriors 

plt.figure () 

SIME SiGeiecere (|B tice eae, Oil, Val Sie iceli, Swe ||, Cayley, Nellets ba ibice nin. Ciey eel 
edgecolors=(0, 0, Q)) 

(Ive SGelecee (VS liseeuln Silass, Wl, wleeeamn size. le ig, Males Vilese cece - 
edgecolors=(0, 0, Q)) 

K = oli aseece(, Sy IONw) 


DIE JOLOE (US 7 61D ax, ciechler jomelsal [ty MosmewerS)) Le, Il, 8’, 

Lelooi=Vitieivel Ieee Ls asi yo IrIpks | kSiciSlk | 

ULE OULOIE (Dk 4 CIO Cie g cicetchber JOO O< [ep MosimveweriS ||) le, IL), Yo, 
Lalsel—= Cori zecl Ioridel® Ba a CIO Ce. Kaieineil | 

plt.xlabel ("Feature") 

eben VabedrC Glass = Ie oiobalbmihaick a) 

(Sabian albus 5) 

eee Using (Ole Semele )) 


plt.legend(loc="best") 


# Plot LML landscape 

ple. bvource() 

thetaO = np.logspace(0, 8, 30) 

ehevally— np. logspace (-1, 17) 22) 

ThetaO, Thetal = np.meshgrid(theta0, thetal) 

biti = || (ome otc. hoc imeveomime lL Ikiwelalaooel (ine. Log | asta! lt, Fp Wesel | 7) 1) 

for i in range(ThetaO.shape[0])] for Jj in range (Theta0.shape[1])] 

[MES np array (UM) F 

(IEG (ino sooo oie). toc isi selneica) |W), Me pexqe lcm rivss Mermol . elise) ||, 
"ko', zorder=10) 

GLhie AO LCre iO aero (ClO Cis Meinl <clciee || [0] 2 iNOnexe (GIS Coc cieimine siclieeel) il], 
"ko', zorder=10) 

pit. ccOlom (Mera thera ly LM) 

[beer geverenen( kero) 

Ore ys@eake(s Logs) 

Olbe s CO Loe Sei i2 1) 

plt.xlabel ("Magnitude") 

plt.ylabel ("Length-scale") 

(@ubie. 4 wslte lien (eter —aisiao mgs il ll sl eviksavererel | 


OIE 5 Sine |) 
Iere, Miseeaiigvel ll dalle ikallaverorel ((abigaie abel) 2 iy ois 
livexer Wwilewietejaligvel |, link eh ctioverexel i @jenc alm vaevel)) & si, 1S 


Ne@ciunesyey 5 UO (shigtiicise yy ON) iejerc ail vaio!) 
yor Ome Un Ae alee awe) Ua SS (eyes eel 


Initial kemel: 1**2 * RBF(length scale=1) 

Optimized kemel: 66.3**2 * RBF(length scale=1.33) 
Train data 

Test data 

















Illustration of Gaussian process classification (GPC) on the XOR 
dataset 


This example illustrates GPC on XOR data. Compared are a stationary, isotropic kernel (RBF) and a non- 

SJ re hile) are aan .¢=1 0 al) @ 10) led colo |U(e1) Mm @)amialism or-laile0|t-laer-hr-ts\>) mm alow DLO) leagelolUleim <>) al-1me)e)r-llalomere)alcyie(-)g-le)hm elodiKle 
results because the class-boundaries are linear and coincide with the coordinate axes. In general, stationary 
olgal=iisme) its) amele)r-liamelsiilslm@csiell ise 


WiNeeac, Wu ss 107o 

MSc Wee Keno giongouo.e cis) jolie 

icOMml SK ISeEIeh CeUSSacin (Ocoee Iie Eess lem Pieces oC lesisal it 1eic 
rieoml SK LSeieins Cause 1 ei WeOcess keris ls amos Near, IWGr Pscocimler 


p&p 20) 


XX, YY = np.meshgrid(np.linspace ( ; 
poe SDD) 


=e 
Mp linspace(—3 
rng = np.random.RandomState (0) 
xX — eng erandn (200,75 2) 
Ne = ids logmuiecl sone, Ol) 2 Wp els, J ee 0), 
# fit the model 
plt.figure(figsize=(10, 5)) 
Keigtoverliiss <== ll sl) = INE (Leics see bel 50) le * Dei Pnecuer (Sicimie Ui 02 | 
for i, kernel in enumerate (kernels) : 
Gillie = Gaussian coOceSs eC less tics (i<eioie lake tel, Well Siceie b= euS )) Sivek ) 
# Dlouw the decision fumection for Gach datapoink on the grid 


Ty, = 4G |Lae SSC UGE SIC IN a Vis Ibe (ec ice vel, Varennes e b) 
Z = Z.reshape (xx.shape) 
foslbierens WOO nee (Clee ae ct an ie) 
image = plt.imshow(Z, interpolation='nearest', 
SPS SiMeleo<e man Soeeuies< (yn WArealihig( ye A/a Aurel <t0) )) 


Zs/0SCr= elke, Oelgiid=) lower’, Cesar scm. aor ie) 
lerejgneebhars eller crelgieehaleo< he Aire “Ap enige ls) Oia se) | Laligtey i itelelars 47 
Gomors—k* |) 
plt.scatter(X|:, 0], Xie, 11, s=30, c=Y, cmap=plt.cm- Paired, 
edgecolors=(0, O, O)) 
jellies eke aS) 
joule aye OS) ) 
ie eseics (|= oes pe 51, ee) 
jonmurmotel mona el-nan@nitcler—s) 
pit erule(e 2s a heog-Verqmal Lt kelincods. es hb" 
6 (Clic kewmel « Cline lhog mercial Males i ilooosh Clik oie elie ce.) ) 


fontsize=12) 316**2 * RBF(length_scale=1.25) 316**2 * DotProduct(sigma_0=0.0104) ** 2 
Log-Marginal-Likelihood:-23.674 Log-Marginal-Likelihood:-9.284 


jvkie ie wejois Ieiewhs,) 
(Siig « Slaw () 
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Gaussian process classification (GPC) on iris dataset 


a altsm=>-<- laa] e)(millercjig-ikstom lalom ©)g=1el(e1(>10m ©)ge)ey-le)|||Nare) i ©) ua Onn 0) ar-l a |-e) 1g) 0) (em-lalem-laltsye)ige)©)(em mis) mil.(-1aa(-) mo) Amro MAU ole 
dimensional version for the iris-dataset. The anisotropic RBF kernel obtains slightly higher log-marginal-likelinood 
lohvar=kst=) (0 aliale mellai=)a>)almn(>)aleliabrsyer-l(-tsm com ial-MmAuomicroiielq>mellaat>)al-j(e)atce 


IMPOrFE NuUmMpy as Mp 

NLdeere ce Gieeedivene Iie ey jeiene sis) jenlic 

from sklearn import datasets 

fe ieOu  sy elSyeiginas eiUlSisake (joe ocosis alume@ue = (er sivistsihelgia i eCioss'6 Iketacn iq Sis 
ELOM st Leanin. Gaussian pL Ocess. sKorne ls st mpores RB 


# import some data to play with 

ics = Clavresees = LOae Mies |) 

Des ae 1S. Cert al [24,0 24 # we only take the first two features. 
Y = Mp.array (1ris.target, dlype=int) 


h = .02 # step size in the mesh 

kernel = 1.0 * RBF([1.0]) 

SIS Tio WSC emeole = ess cide mOCos SC leSeqle mow (serine Wa Stemye ake 7) 
kerned = —l2O0 BBE Cl Oe 101) 

CIN lone eI SOE rose = IseWlissahelaiemoce ss ClkeiiS 18 Were (Siegler (me 7), 
# create a mesh to plot in 

pe akin < ithel = OK BO) rma (he CS Ol) eideme( 9) sie Il 

Ne Tui, Nees = OK Ih iene) = chp et ep JL) aime) =e al 

MOG Vr = Toys es llgua ich lines eireiineie (O< Mini, 6 Wess Ie) 5 


ge) are ierevncie (7 iniliot, ye ane lal) ) 
jeckag Neves |, Isveieiejose Irdeig i Aal Crone ae oulie, Iglesia | 
[Sober elie (bien (ne le isbyae — (MIO sy) 
were al, @ilie ash Sivliioiceice | (jee iclere aLSGQrise(ouke, Cie Tie Ei SCC we) | 2 
Ff Plot thew predicted probabsalities hor thee, we willeassigqnu a color te 
fee ach PO Ul El Mellie wine oll mil ce ien, ei pilase | Sa 7 eI ay elec 7 
jel Fvemsiilese) Mene nik sea aa ily 
A == (lit oeselicir jocoloe (Nese [ex steele re teenrsil) J) ) 
1+ Put Phe result amto a color plot 
Z2 = 4Z.reshape((xx.shape[O], xx.shape[1], 3)) 
(IME 6 MINSINGI A, (EPOSSTNC= NOK NTI Be Mlel< i SP iii, Sf MME)» Orem Gal ioiY loumee 
ol bens orstlclery lava. esacbals elem clasuaes 
Ditters Cate wera: a Olly oo las paella ee — lps ayu(e e Che ee ai valle 
edgecolors=(0, O, O)) 
plt.xlabel('Sepal length") 
Die y label sepal wioeh) 
jibe ey Iki (oe Anohian hp poe mingles 1) 
jodie A vllm eae hk SAF inele () 
(oils teeneoe eine kes nh) a) 
jibe is ce er (i6) 9) 
Oecd ele os, es EM oc or ea 
Gea Mes) (tlm tellin ove, iniemetealideyle Ia Sia interorl( Clik s Siege th a ielwSicet)) jt) 
Anisotropic RBF, LML: -47.888 
ibe icaeinte Levee (() 
jolie aSlaveny 
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Dye TaKsyiAvan =esyil pare tie lame) @r-Mer-lUrstir-lamanlbdiula> 


Plot the density estimation of a mixture of two Gaussians. Data is generated from two Gaussians with different 
orcTal (1 ecm-lale mee) \c-larclarecom parca (ectse 


smi Clon aon elelitlohamr-romm aye) 
iyeeiat jmenciolec late jexfodene evs jolkic 
imeem Mencjolew lilo. cols) alice lliexe | Mera 


from sklearn import mixture 


i Senmolos = Sie 

# generate random sample, two components 

igh omar beloleyiwasi-t-10m Gen 

# generate spherical data centered on (20, 20) 

Sire Gauss ici = Wome elem. evel (ia Selmoles, 2) a iMesereicey| |20, 210 |) 
# generate zero centered stretched Gaussian data 

Ce Mion cesarean = Ore slip ho 25 7 meee 0) 

SL cSrchisel Geussielia = ie COe iKios raMclol. ancl in Semoles, 2), C) 

# concatenate the two datasets into the final training set 

<M tei = Mo. vScacl| |Sltliitsd Caussiem, Scmeeccmscl celiss ies | } 

# fit a Gaussian Mixture Model with two components 

Clie = MipscMice GAUSS LaMNOoCcUieS Vin Comooweimes=2, Cove telemes yoo = "ul" ) 
GALIE 5 ILE (OK eels Ia) 

# display predicted scores by the model as a contour plot 

i as ah ysiocioe (2057 SOs) 

Ve Jalen llakigtsiecvere (0) 200).0) 

xX, Y = np.meshgrid(x, y) 


AX ene akray (| Xpravel () yy ravel() |) i 
b= SCA SOC GS Srelilolies VOG 
Z = Z.reshape (X.shape) 


CS = plt.contour(X, Y, Z, norm=LogNorm(vmin=1.0, vmax=1000.0), 
levels=np.logspace(0, 3, 10)) 

CBe—~ pli, colotbar(Cs,; sacunk—U VS extend— bork”) 

lle. SCENSIe UX eels, Ul, -< teed l|e, Li, 23) 


plt.title('Negative log-likelihood predicted by a GMM') 
[Culler gems eC! eielaias 
Okie 5 SION) 
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GIVI even eslatslalersts 


Demonstration of several covariances types for Gaussian mixture models. See Gaussian mixture models for more 
Talio)aant-licolame)amialsm=r-)ilpat-ico)emaviialelelelamenVii/m-laome) is) muls\-1e mre) arey[UI<)(-1al ale MmY(omer-lamere)pa|el-Ig-mialome)olr-llalsremeilUrsii>1es 
with the actual classes from the dataset. We initialize the means of the Gaussians with the means of the classes 
ico)aamiatsmie-lialiare esi>) Mm cOm aalcl,Comcaltsmece) pa) ey-lalsve)amvs- le mm Alcom ©)(0)m ©) a-1e|(e1t-1emt-lel-) (cme)ameleliamic-lialialem-lacemal=)(emeleimcctsit 
data using a variety of GMM covariance types on the iris dataset. We compare GMMs with spherical, diagonal, 
10] |Pur-tale mm i(=remexe)’c-lar-lalexom aar-igiexstom [aml alelgor-t>)] ale Me) ce(>)] mxe)m el-1ae)anat-lalex> mma ialelele|ame)alomyiVe 0] (0 m=).4 e\-Le1mi0||meren\c-lar-laler=) 
to perform best in general, it is prone to overfitting on small datasets and does not generalize well to held out test 
data. On the plots, train data is shown as dots, while test data is shown as crosses. The iris dataset is four- 
olTantslatsiie)ar=| Mm @al a tarcmilesvm iw (eme||aatslalsie)alcw-laoms)ale) am als) asmers lace mall lomcyo)aaiom ole) |altsw-laomci>) ey-leclicrem lame liarys 
olTaat=yalsile)arse 


UN Oome miereolkC@redlalley gels Well 

IMM OCIe swieneo lence ile jovsedions chs) jolie 

LMpeLE MUMMY Tas enp 

from sklearn import datasets 

from sklearn.mixture import GaussianMixture 

Ter SiclSeuen since Selec wei aise Sie ele we MS cllk SLO! 


ojo eres — | Vigan yy le ige(pientsren so) elena de racigie(e.. | 
cer maks Climoses (Clim, Eox) & 
icone ig (Celene alia SiubitSiesc= (elollore Ss ) = 
Ii Chit COVE ees Tyros == ibis 
COVECILEMICSS = Ci, COveIoieamees lit) |b2. e2] 
Slice Culms COvVereLence Tyos == 'izaool! s 
COVEN AICS = Cftiil,COVeErIEmMCes [2 24, 82) 
ele CMilml. CCverlenece yoS == “clive! < 
COvercleiiCes = iyi. cluere (elim. Coveiolemees lial] |s2 1], 
SIICe Cit COweieienes “Cee == i sjomernee Ih’ - 
COVERIAMCSS = MI,eve (cin Meams .clieioe |i) ~ Ciim.coOveeramees [ia] 
v, w = np.linalg.eigh (covariances) 
u = w[0] / np.linalg.norm(w[0]) 
angle — np varctanz (ul lhl, ulol>) 
angle = 180 * angle / np.pi # convert to degrees 
NP aera oes 72 8) &F Nolen tolepaie (Ne), 
Silil = mjolsSevechios Mil Lijoees ciim.meems Lit, 82), WO, whl; 
pc) Sarerclale l= eee leila lee@ lena), 
subi sGigie elije: lies (ese 4 SIs | 
elileSer, eJlleiie (We Ss) 
eve cee! eiceie ec (eli, 
Bis SEG Eisjoeee | eciel’, ‘Cerca liin” | 


Hiei == Cleiccieees « NOcle Aieiis() 

# Break up the dataset into non-overlapping training (75%) and testing 
# (25%) sets. 

SIE = Sieieeve li weehkino helm ejolancs—4 

# Only take the first fold. 


EIeGULIN DCE, “CSS Mix = Meron Wi cere (Skit, SOLIe ILS Clee, ILIGS , EeueGgSic)) )) ) 

MC ibieeulid, = ies scl tel|ecelia scl | 

VP EIee = ICIS eNOS IS || Eieeliiel aLaclenx< | 

Mess = eS .ceivalllicese ies] 

V SS = ISLS. elitgen | sSsit LindlSx| 

in Clesses = [eid lj. ULES 7 iiceal)) 

# Try GMMs using different types of covariances. 

SSLIMLOLS = (COW CYOSE GEMSslemlii x cuicSe lin Comoomenrce=1 Classes , 
COVELLAMCS OVOS=COw yjoe, mex osi=2Z, weamcom sineos= 0!) 
ins TOY Veyicle aki [| sjoltsiciieell , vellewm, “elec, 9 ieeulib’ | | 


i SS Cie toes = lei (SSicameceies | 

plt.figure(figsize=(3 * n estimators // 2, 6)) 

Pliccsulselors echusic Voor coma, 6OS=b.8S, liseecs=. la, Weoece=. U5, 
left=.01, right=.99) 


for index, (name, estimator) in enumerate(estimators.items()): 
# Since we have class labels for the training data, we can 
# initialize the GMM parameters in a supervised manner. 
SELIM COM Weems LHL = Wooawway | 6 ial) y Ciceim == 1) .imSseln lex S=0l) 
ioe ik dM ieitelS lo le lleissies)) ||) 
# Train the other parameters using the EM algorithm. 
Ss CAM COLO Ie Lie I< Tice) 
a = jole.sulsslor (2,  Ssicimecors (7 2, aacles< a IL) 
WEIS ClLilvesss (Seine tor, In) 
for n, color in enumerate(colors): 
data = iris.data[iris.target == n] 
(Ste secneceia Cenecs | Os aeeuecdlan I Sale vee lliona tee dkeia, 
LaleSl=Licws aes MelniSs |i | | 
# Plot the test data with crosses 
for n, color in enumerate(colors): 
Otel = X west ly res S= a] 
pbc en Se cneigeie (erica, Cll eleveciie 7 lye weneees 4 2 iclelkeic—iee) hee, 
Wo Rigel jOmiScl = SSIcIMeloOle ole Sele sk iil) 
EIeUI AOC = Wyo wien (V7 “ciel joeel ieee) == Wy eeeiinaielwell i) )) ~~ LIU 
este (O GS, Wet, “Wicca eceteeey? a ii! “a ieesldl elcewioeicyr, 
transform=h.transAxes) 
V LSS c joie = Ssiciille lor. emechler (xe esi) 
ESSE ACCURECY = W/O imMSein (yy esc joissclicenyel |) == 4 cesi.iwernely ) o LOle 
eee aS, Wee, Vests eliociieeey 3s silly @ CSOs eeeivicacy, 
transform=h.transAxes) 
phe ewe las) ) 
joule 6 vacates) (1) ) 
plt.title (name) 


plt.legend(scatterpoints=1, loc='lower right', prop=dict (size=12) ) 
joie. Slnenr() 


OUTPUT 


Automatically created module for IPython interactive environment 


Fain accuracy: €6.4 


Test accuracy: 92.1 


Tain accuracy: 95.5 
Test accuracy: “100.0 


Tain accuracy: 93.8 


i 
. 


Test accuracy: 69.5 


Tain accuracy: 94.6 


Test accuracy: O74 


setosa 
versicolor 
virginica 





CETUSSSF Tam VIP dUeleom iY (0le(o1 im mill] exsye)iets 


malo)mlalomere)avice(caleromsyi || essve)(e\sme) mre Man diUldcMe)mANVOMOr-lUKstirclarcmele)rclialsvom nV lia md.<elc(eiUlO)AMIUCchAnalistslile lp 
(GaussianMixture class) and Variational Inference (BayesianGaussianMixture class models with a Dirichlet 
process prior). 


Both models have access to five components with which to fit the data. Note that the Expectation Maximisation 
model will necessarily use all five components while the Variational Inference model will effectively only use as 
many as are needed for a good fit. Here we can see that the Expectation Maximisation model splits some 

oto) an) ole) ats lalesmcl fe) ieclalhvam OXouerelUlsoml MSM AtZ I Ale MCOMil@n Colom patelanmere)in]ele)a=yalcsMmnvali(cmmcalcm By 1a (e10) (21mm aa (olerststom tn lele(cymr-\ere| ecm 
nlUlanloy-yme)iesitcl(cur-lbnce)aartilerc live 


import itertools 

IMO © Mri’ Els are 

iricielm Sieaoyy ailerons  ILaliavellle| 

joc ulereellerclile jeweller <i jelic 
Lone Mee elOedale es ied. 

from sklearn import mixture 


GO loie Te SF WeSmcOOlhSsevelol lide. "C, “womnnlenierio lua, verelch"; 
"darkorange' ]) 
cleir LOE KSSmMins, Vo 7 Wiens, COveTIemess, 1miclex, tac he) 
sjllor = jolle suleollee(2, iL, i ar alisteless) 
for 1, (mean, covar, color) in enumerate (zip ( 
Seis, COME RLENCSS,, COlLer Wee) | 
v, w = linalg.eigh (covar) 
“= 25 8 Wo. Seiee25)) ~~ sae, seicie (Ww) 
u = w[0] / linalg.norm(w[0]) 
# as the DP will not use every component it has access to 
# unless it needs it, we shouldn't plot the redundant 
# components. 


Tit INO INO. a == 1) 
continue 
OMeeSCeICEOe IY == 1, Oly, lh Say, Livy sey, COlee=eolionr 


# Plot an ellipse to show the Gaussian component 

angle = np.arctan(u[1] / u[0]) 

angle = 180. * angle / np.pi # convert to degrees 

ell = mpl.patches.Ellipse(mean, v[0O], v[1l], 180. + angle, color=color) 
SIGS Gl jo 6x sjollor olsioex) 

elves elyekie (U.S) 

syolione pevelel, aie mere (Sil) 


jodie esha (cis ee 2) a), 
(Oalkieeey Inibint( = Sea, Oe) 
jolie xe meles 1) 
jeder 5 aE sbelcs (1) ) 


jodie. 1G ele Gene ey, 
i” SeimoOles = aU 


igowmarchatele)ipusi—l-tom Gen 
CMa Oras tea i) Oey ea Oo pe Vee Tee Saag) 


i = moet [Lys clore no. iecinclonl., wane (io Seimlolos, 2) 7 ©) - 
57 iyo. eeinclom, eeuaclo iia Seimciles, 2) s+ mo,eiaeay | —6, 3] ) | 
Sai = MALS cUIeS CAUsSSlanMi Suis (il CSCMSOMSINES=5, GOveielameSs Eyes" TU JL!) emIve ls) 


OLGE MeswIkes 0X, Cini, SKC (OX), Ciliil Mens -, Ciili,COverlamess », Ul, 


"Gaussian Mixture') 


SfoCiml = MMS eUicS. Baye Leneame silence uicS (il COnoomeimee=5 - 
COWENC Lames “Evie = eM) 6 ee (0) 
lor TeSsulics (O<, chicimuisjoigSciwoe (|, CSch Wises , Choc, coveriences , I, 
"Bayesian Gaussian Mixture with a Dirichlet process prior') 
[les Slaven) 
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